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An Unusual Retrotransposon from the Yeast Candida albicans 

5 RELATED APPLICATIONS 

Reference is made to U.S. application Serial No. 60/106,342, filed October 30, 1998. 
U.S. application Serial No. 60/106,342 and all documents cited therein ("USSN 
60/106,342 cited documents") and all documents referenced or cited in USSN 
10 60/106,342 cited documents are hereby incorporated herein by reference. In addition 
n all documents cited herein ("herein cited documents") and all documents cited or 

y3 referenced in herein cited documents are likewise incorporated herein by reference. 

III 15 FIELD OF INVENTION 

~ The invention relates to a novel retrotransposon. The novel retrotransposon is from 

fi the yeast Candida albicans. In particular, the invention relates to a retrotransposon 

% pCal which belongs to the Jyl/copia group. 

lD 20 

" y INTRODUCTION 

Candida albicans is an asexual yeast species which is the major fungal pathogen of 
humans. Although it is commonly found as a harmless commensal organism, 
inhabiting mucosal membranes and the digestive tract, it can cause superficial 
25 infections, such as oral thrush, in otherwise healthy people and can cause severe, 
often fatal, systemic infections in immuno-compromised patients. The recent 
increased use of immunosuppressive treatments and the increased incidence of 
immunosuppressive diseases, such as HIV infections, have meant that C. albicans 
infections are of increasing medical significance (Odds 1988). There is significant 
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strain variation within this species, potentially affecting virulence, and mobile 
retroelements have been suggested as one source of this. 

Retroelements are a widespread family of sequences that can replicate via the reverse 
transcription of single-stranded RNA into double-stranded DNA, or are assumed to 
have arisen in this way. Two major types of retroelement are the retroviruses, such 
as HIV1 and Moloney murine leukaemia virus, and the retrotransposons such as Ty1 
and Ty3 from Saccharomyces cerevisiae (Boeke and Sandmeyer 1991). The 
structures and lifecycles of retrotransposons and retroviruses are very similar. The 
major difference between the two groups is that the retroviruses can form infectious 
virus particles which can be transmitted between cells and between individuals. 
Retrotransposons can form intracellular virus-like particles (VLPs) but they lack the 
genes coding for the viral envelope so the VLPs are usually confined to the one cell. 

Similarly to retroviruses, retrotransposons consist of an internal domain flanked by 
long terminal direct repeats (LTRs). In Ty1, for example, the LTRs are about 335bp in 
length and the internal domain is about 5.3kb long. The internal region has two long 
open reading frames (ORFs) homologous to the gag and pol ORFs of retroviruses. The 
gag gene encodes the structural proteins which make up the VLP while, downstream, 
the pol gene encodes the enzymes required for reverse transcription and integration - 
protease, integrase, reverse transcriptase and RNase H. The LTRs contain the 
promoter and the transcription termination signals and are functionally divided into 
three regions - U3, R and U5. Transcription proceeds from the U3/R boundary in the 
left LTR to the R/U5 boundary in the right LTR to produce an RNA molecule which 
has the R region repeated at each end. Translation of this terminally redundant mRNA 
is usually regulated to ensure that the structural proteins of the VLP (gag) are 
produced in much higher quantities than the enzymes [pol). This is because large 
quantities of the gag proteins are required for the assembly of the VLP but only 
catalytic quantities of the pol enzymes are required. 
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The most common method of down-regulating the translation of the pol ORF is to 
have it out of frame relative to the upstream gag ORFl A rare, programmed ribosomal 
frameshift is thus required for translation of the pol ORF. A number of 
5 retrotransposons employ a + 1 frameshift. Ty1 achieves this by tRNA slippage while 
the Ty3 mechanism involves the 'skipping' of a base. The Ty1 -slippage mechanism 
involves a seven base sequence, CUU AGG C. It is thought that a tRNA LeuUAG , 
which can recognise all six leucine codons, slips forward one base from CUU-Leu to 
UUA-Leu, during a translationai pause caused by a rare tRNA Ar 9 ccu (2). The Ty3 

10 +1 frameshift also involves a seven base sequence, GCG AGU U. An alanine-valine 
sequence (encoded by GCG-GUU) is produced but tRNA slippage is not involved. It is 
thought that out-of-frame aminoacyl-tRNA binding or four-base decoding is 
responsible. Frameshifting is stimulated by the low availability of the tRNA decoding 
the AGU-Ser codon and also by the 12 nucleotides downstream of the AGU codon. 

15 Retrotransposons have also been found to use a -1 frameshift; an example is CfT-l of 
Cladosporium fufvum. Here the ribosome is thought to slip back one base on the 
sequence AAAA slightly upstream of the gag termination codon. 

An alternative method of down-regulation has been found in the cop/a 
20 retrotransposon. Here the gag and pol ORFs are fused into one long continuous ORF, 
but a splicing reaction usually occurs prior to translation to excise most of the pol 
region from the mRNA. Only occasionally is a full-length RNA translated with the 
concomitant production of the pol enzymes. 

25 Following translation the retrotransposon proteins and RNA can form into a VLP. This 
consists of a shell of gag proteins with the pol enzymes and genomic RNA packaged 
inside. The VLP is the site of reverse transcription. In general, the process of reverse 
transcription in retrotransposons is very similar to the well-characterised process of 
retroviral reverse transcription. Two important steps in the reverse transcription 
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process are the priming of minus- and of plus-strand DNA synthesis. Minus-strand 
synthesis is most commonly primed by a cytoplasmic tRNA (often initiator methionine 
tRNA) which is packaged within the VLP along with the mRNA of the 
retrotransposon. The retrotransposon has a region adjacent to the left LTR, known as 
5 the minus-strand primer binding site t(-)PBS], which is complementary to the 3' end 
of this tRNA. The tRNA binds to the retrotransposon RNA at the (-)PBS and can then 
be used by reverse transcriptase as a primer for the synthesis of minus-strand DNA. 
Plus-strand synthesis is primed by a short purine-rich sequence, known as a 
polypurine tract (PPT), located just upstream of the right LTR. After minus-strand 
10 DNA synthesis has passed this sequence the RNA is nicked between the PPT and the 
LTR. The PPT RNA can then be used as a primer for the synthesis of the plus-strand. 
Reverse transcription is generally very inefficient; greater than 10% of cellular mRNA 
can be retrotransposon RNA yet the dsDNA form is not usually detectable by 
Southern blotting. 

15 

Following the synthesis of the dsDNA form of the retrotransposon it may integrate at 
a new site within the host genome. This process is likely to involve a complex of the 
integrase enzyme associated with the two ends of the retrotransposon DNA. In a 
process which is not well understood the integrase complex must be released from 
20 the VLP, move into the nucleus and then insert the DNA into a new genomic site. 
Studies with Ty1 and Ty3 have shown that the integration site-selection mechanisms 
of these retrotransposons are non-random and appear to be specifically adapted to 
avoid causing disruption to the host genome. 

25 Retrotransposons can be divided into three major groups based on their reverse 
transcriptase sequences and the order of the genes within their pol ORFs. Members of 
the Ty3/gypsy group are the most closely related to the retroviruses and share a 
similar pol gene order - protease, reverse transcriptase, RNase H and integrase. 
Examples of these elements are Ty3 of S. cerevisiae, gypsy of Drosophila 
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mefanogaster, Tf1 of Schizosaccharomyces pombe and del of Lilium henry/. Members 
of the Pao group, for example Pao of Bombyx mori and Tas of Ascaris lumbricoides, 
have a similar pol gene order to lyZlgypsy retrotransposons but can be distinguished 
from them by their reverse transcriptase sequence. JyMcopia elements are most 
5 easily distinguished from Jy3igypsy and Pao retrotransposons and retroviruses by the 
gene order of the pol protein - protease, integrase, reverse transcriptase, RNase H. 
This group includes Ty1 and Ty2 of S. cerevisiae, cop/a and 1731 of D. 
mefanogaster, Tst1 of So/anum tuberosum and Tnt1 of Nicotiana tabacum. 

10 The first Candida retroelement, TCa1, was identified through the discovery of 
multiple-copy isolated LTRs dispersed around the genome (1). These LTRs were 
discovered in an analysis of moderate repeat elements. Subsequently, composite 
elements, named TCa1, consisting of two LTRs flanking a 5.5kb internal domain were 
also found. In the C. albicans strains tested, one to two TCa1 loci were found, 

15 indicating between one and four copies of TCa1 depending on whether the loci were 
homozygous or not. TCa1 has many features of a typical retrotransposon including 
388bp LTRs, beginning TG and ending CA, with six nucleotide inverted repeats, 
TGTTCG . . . . CG AAC A, at either end. The element is flanked by 5bp duplications of the 
host DNA and is transcribed to give an approximately unit length mRNA. Within the 

20 5.5kb internal domain a (-)PBS and a plus-strand priming site are evident. The (-)PBS 
was not immediately obvious: no complementarily to tRNA iMet (as used by Ty1 and 
Ty3) could be found. Bases 31 to 39 of tRNA Ar 9 3 of S. cerevisiae, however, 
perfectly complemented the nine bases immediately adjacent to the left LTR 
(GATTAGAAG). There is, for some tRNA, a high degree of conservation between S. 

25 cerevisiae and C. albicans leading to the suggestion that a cleavage product of a C. 
albicans tRNA Ar 9 might serve as the primer. This suggestion is supported by the 
knowledge that the primer used by the copia retrotransposon is a cleavage product of 
tRNA ,Met containing only the first 39 nucleotides. 
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TCa1 has been shown to be transcriptionally active, but an analysis of 1200bp of its 
internal sequence has indicated that it is defective, there being multiple stop codons 
in all three reading frames. It is remarkable, given the clearly non-functional nature of 
this element, that the LTRs remain identical and that the plus- and minus-strand 
priming sites remain in apparently functional form. It is possible that the defective 
TCa1 retrotransposon has been maintained via the passive reverse transcription of its 
RNA by the products of a functional C. albicans retrotransposon. This passive 
replication would require that the element has identical LTRs and functional plus- and 
minus-strand priming sites but would be independent of the element's internal 
sequence. 

The object of the invention is to provide a novel retrotransposon, in particular the 
isolation and sequencing of pCal, an unusual, novel lyMcopia retrotransposon from 
C. albicans. The free, linear, double-stranded DNA form of this element is so highly 
expressed that it can be seen as a distinct band when uncut genomic C. albicans 
DNA is simply analysed on an agarose gel. It contains features conserved in TCa1 and 
other retrotransposons and has additional features previously unreported in the 
retrotransposon family. 

The sequence of another C. albicans element, potentially retrotransposon-like in 
nature, has recently been submitted to the databases by a group in the U.K. 
(accession no. Y08494). This element has been named beta and is defined as an LTR. 
It consists of a repeated sequence about 400bp in length, flanked by 5bp direct 
repeats of the host DNA, and associated with tRNA genes. The borders of the 
element consist of short, imperfect, inverted repeats: 5'- 
T AATGTATA .... TATAC A AC A-3 ' . Such an element is reminiscent of the isolated LTRs 
of other retrotransposons which are the result of homologous recombination between 
the ends of a retrotransposon with the concomitant deletion of the internal region. No 
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significant similarity is detectable between the beta sequence and the LTRs of TCa1 
or pCal of the present invention. 

SUMMARY OF THE INVENTION 

The invention provides an isolated and purified retrotransposon having a copy number 
of between 40-150 (preferably 50-100) copies of free DNA of itself per genome 
(preferably 10-25 megabases, more preferably substantially 15 megabases). The 
DNA is preferably linear and is more preferably double stranded. 

The retrotransposon may be isolated from fungi or yeast, preferably Candida and 
more preferably from Candida albicans. 

The invention also provides a novel retrotransposon comprising at least one 
polypeptide positioned between at least two long terminal repeats, and wherein the 
retrotransposon is capable of integrating into the DNA in a genome providing a copy 
number of between 40-150 copies per genome. The copy number is preferably 50- 
100 copies. 

The retrotransposon does not necessarily integrate into the DNA. 
The retrotransposon preferably belongs to the JyMcopia group. 

The retrotransposon is preferably isolated from fungi or yeast, preferably Candida and 
more preferably from Candida albicans. 

The retrotransposon designated pCAL includes two long terminal repeats (LTR's) 
flanking an internal domain comprising at least two open reading frames. 
Advantageously, the LTR regions as identified in the sequence illustrated in Figure 2B 
may be used to introduce DNA into the genome of a cell. 
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Accordingly, there is also provided by the present invention a method of introducing 
DNA into the genome of a cell which method comprises introducing a transposable 
element comprising a nucleotide sequence encoding a desired protein located 
5 between two long terminal repeats sequences having the sequences illustrated in 
Figure 2B, which element is such that it can insert into the genome of said cell in the 
presence of an appropriate integration factor. Preferably, said integration factor 
comprises an integrase which preferably is itself included in said transposable element 
and which integrase is derived from the POL region of said pCAL retrotransposon. 

10 

The transposable element for introducing a desired DNA sequence into the genome of 
W the cell also forms part of the present invention. This transposable element 

jz comprises an internal domain for receiving a nucleotide sequence encoding a desired 

^ protein flanked by two long terminal repeat regions having the sequences identified in 

Iff 1 5 Figure 2B. The transposable element may advantageously also be included in a DNA 

as, 

p transfer system comprising said transposable element, which is capable of integrating 

s into the genome of said cell in the presence of an appropriate integration factor and, 

O said integration factor. In a preferred embodiment, the transposable element 

*t comprises an open reading frame encoding said integration factor which is an 

% n 20 integrase protein, which preferably is encoded by nucleotide sequence within the POL 
region of the retrotransposon of Figure 2B. 

The invention provides an isolated and purified retrotransposon comprising a 
nucleotide sequence selected from the group comprising: 
25 (a) The sequence illustrated in Figure 2B; 

(b) A nucleotide sequence with at least 65% similarity with the LTR and POL 
region of Figure 2B; 

(c) A nucleotide sequence that hybridizes under conditions of standard stringency 
to the nucleotide sequence shown in Figure 2B; and 
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(d) A functional fragment of (a), (b) or (c). 

The retrotransposon is preferably pCal. 

5 The invention also provides the integrated form of the retrotransposon of the 
retrotransposon pCal, which has been designated TCa2 or sequences capable of 
hybridising thereto under standard hybridisation conditions. 

The invention also provides an expression vector including any of the aforementioned 
10 retrotransposons or fragments thereof. The expression vector may be used to 
transform the cell into which the DNA is to be introduced. The expression vector 
may be introduced by any suitable means such as micro injection or electroporation or 
the like. The discovered promoter of RNA transcription is temperature regulated such 
that comparatively high levels of transcription occur at up to 37°C. Thus, levels of 
1 5 transcription may be regulated as required by altering the temperature. 

The invention also provides the use of any of the aforementioned retrotransposons in 
a gene disruption system and in a gene discovery system. Upon active 
retrotransposition the retrotransposon can integrate into new sites in the fungi/yeast 
20 (preferably Candida) genome causing gene disruption which is preferably non- 
revertible. The retrotransposon can be 'tagged' with a selectable marker gene 
carrying its own promoter. This disruption system permits discovery (isolation and 
characterisation) of the disrupted gene. 

25 The invention also provides a retroviral-like carrier system comprising any of the 
aforementioned retrotransposons, preferably pCal. The invention gives rise to virus- 
like particles in the yeast which can be modified to contain novel proteins such as 
enzymes. 
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The invention also provides a transformation and expression system for fungi/yeast 
(preferably Candida) comprising any of the aforementioned retrotransposons. The 
discovered promoter functions in a variety of yeasts including Saccharomyces 
cerevisiae and Candida maltosa and Candida albicans. 

5 

The invention also provides nucleic acid encoding a retrotransposon having a copy 
number of between 40-150 (preferably 50-100) copies per cell. The invention also 
provides the nucleic acid vector. The vector may be a gene expression vector. The 
vector may be a plasmid. 

10 

The invention also provides cells containing the nucleic acid including transposable 
elements and retrotransposons according to the invention. The cells may be 
contacted with a desired compound to identify its effect on the phenotype of the cell 
conferred by expression of the protein encoded by the nucleotide sequence provided 
1 5 in the transposable element. 

The invention also provides the linear or circular, double stranded DNA copy of the 
retrotransposon. 

20 Also provided by the present invention is a method of assigning a function to a 
nucleotide sequence which method comprise providing said sequence between the 
long terminal repeat sequences of the transposable element according to claim 1, 5 or 
12 and introducing it into said cell and monitoring for the presence of an altered 
phenotype of said cell compared to a cell which has not had said nucleotide sequence 

25 introduced therein. 

The invention also provides a nucleic acid fragment selected from the group 
comprising: 
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a) a nucleic acid sequence positioned between at least two long terminal repeats 
of the sequence of pCal as described in GenBank accession number 
AF007776; 

b) a nucleic acid sequence with at least 65% similarity with the LTR and POL 
5 region of the sequence of (a); 

c) a nucleic acid sequence that hybridizes under conditions of standard stringency 
to the nucleotide sequence of (a); and 

d) a functional fragment of (a), (b) or (c). 

10 The nucleic acid sequence preferably comprises a functional POL gene. 

More preferably the nucleic acid sequence comprises two long terminal direct repeats 
flanking a series of genes in the order gag (group antigen), pol (polyprotein) where the 
pol sequence comprises an aspartic protease, an integrase and a reverse 
1 5 transcriptase/RNAseH, particularly as seen in Figure 2B. 

The invention also comprises a functional (preferably temperature) inducible promoter 
isolated from a retrotransposon according to the invention. The promoter is 
preferably temperature inducible. 

20 

The invention also provides novel retrotransposons isolated from fungi/yeast, 
preferably Candida. In particular the invention provides retrotransposons 1-28 and 
more particularly retrotransposon 15. 

25 The invention provides the use of the sequences 1-28 as probes and also provides 
use of the sequences 1-28 in any of the gene disruption systems, gene discovery 
systems, retroviral-like carrier systems, transformation and expression systems above. 
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The invention also provides the use of the sequences 1-28 in an expression vector as 
above. 

The invention provides amino acid sequence equivalents to the nucleic acid sequences 
herein described. 

Furthermore, the invention comprehends uses of the retrotransposons, the nucleic 
acid, e.g., DNA, RNA and amino acids of the invention, such as methods employing 
and/or compositions containing and/or comprising one or more a retrotransposon, 
nucleic acid, e.g., DNA, RNA and/or amino acid of the invention, including, for 
instance, wherein the retrotransposon is a vector containing and/or expressing an 
exogenous nucleic acid molecule. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Embodiments of the invention are now described, by way of example only, with 
reference to the drawings, in which: 

Figure 1 shows the presence of a high copy number, extrachromosomal element in C, 
albicans strain hOG1042. An uncut sample of hOG1042 DNA was electrophoresed on 
a 1 % agarose gel alongside some marker DNA (sizes in kb indicated at left). A distinct 
band of about 6.5kb running ahead of the bulk of the chromosomal DNA (>20kb) 
indicates the presence of an extrachromosomal element in this strain. The relative 
intensity of the band suggests that the element exists at about 50-100 copies per cell 
(see text). The gel photo was scanned using a BIO-RAD GS-670 Imaging 
Densitometer and annotated using Adobe Photoshop™ 4.0. 

Figure 2(A) shows the general structure of pCal. The boxed triangles represent the 
LTRs. The long boxes represent the internal region. The arrows below the boxes 
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indicate the extent of the two long open reading frames. The positions of the encoded 
products are indicated: GAG, structural protein of the virus-like particle; PR, protease; 
INT, integrase; RT, reverse transcriptase; RNH, RNaseH. The termination codon at the 
end of each ORF is indicated by a vertical line. Selected restriction sites are shown 
5 above the diagram: B, Bgl\\) P 7 PstV, Sac, Sac\; A, >lsp718; Sal, Sa/I; E, EcoRI. 

Figure 2(B) shows the complete nucleotide sequence of pCal and deduced amino acid 
sequence of the two long ORFs (translated using the non-standard C, albicans genetic 
code). Every tenth nucleotide is indicated by a dot above the sequence. The terminal 

10 inverted repeats of the LTRs are underlined. The putative poly-A signal and TATA 
boxes are highlighted in bold and labelled above the sequence. The minus-strand 
primer-binding site [(-)PBS] and the additional region complementary to the tRNA Ar 9 
fragment are in italics. The stop codon at the end of the gag ORF, the adjacent 
purine-rich tract (PRT) and the stems of the pseudoknot are highlighted in bold. The 

15 PRT is also in italics. The 5 T and 3 1 limits of the pseudoknot are indicated by < and 
>, respectively. The 3' polypurine tract (PPT1) and internal polypurine tract (PPT2) 
are highlighted in bold. 

Figure 3 shows the plus- and minus-strand priming sites of pCal. (A) Minus-strand 
20 primer-binding site. The region of pCal around the (-)PBS (bottom) is shown compared 
to the first 39 bases of tRNA Ar 9 3 of S, cerevisiae (top). The region of pCal shown 
here extends from base 271 to 341. The bases of pCal within the LTR are underlined. 
For clarity, the bases of the tRNA molecule are shown in their unmodified form. (B) A 
comparison of the polypurine tracts of pCal and TCa1. The TCa1 and pCal 3' PPTs 
25 are adjacent to the right LTRs. The pCal internal PPT (bases 3455-3465) is from 
within the presumed integrase coding region. 

Figure 4 shows the conserved motifs in the pol ORF of pCal compared to those of 
other TyWcopia retrotransposons. Absolutely conserved amino acids are indicated by 

13 

tjk0879 



PATENT 
674521-2001.1 

an asterisk (*). Positions containing 4 or 5 identical amino acids or in which there are 
only two types of amino acids present are indicated by a caret ( A ). The numbers in 
brackets indicate the positions of the motifs from the start of the gagipol fusion 
proteins. 

Figure 5 shows the comparison of the putative pseudoknot structures of Moloney 
murine leukemia virus (A) and pCal (B) at the boundary of their gag and pol ORFs. The 
stop codons are shown in bold and the 8bp purine-rich tract in italics. The long lines 
represent the base pairings in the second stems. Note that in pCal there are two 
downstream regions to which the first loop of the pseudoknot can anneal. The 
nucleotides in the bulge of the first stem of pCal also have a downstream region to 
which they can potentially anneal (bases marked *). Base pairing between these 
sequences could lead to the formation of an alternative pseudoknot. 

Figure 6 shows the phylogenetic tree of some LTR retroelements. The data used in 
the tree construction were the predicted amino acids of the seven conserved domains 
of reverse transcriptase identified by Xiong and Eickbush (1990). The tree was 
constructed using the UPGMA method available within the PHYLIP package 
(Felsenstein (1989). The percentages of trees, from 500 bootstrap replications, 
supporting each branch are indicated. Non-LTR retrotransposons were used as an 
outgroup to root the tree. The accession numbers for the sequences of the elements 
can be found in the Materials and Methods section of the Detailed Description. 

Figure 7 shows that the expression of pCal DNA occurs in a temperature- and strain- 
dependent manner. Cultures of the seven indicated C. albicans strains were grown at 
27 C and 37 C to late log/early stationary phase following which total DNA was 
isolated. Approximately equal amounts of undigested DNA samples from each culture 
were then electrophoresed on an agarose gel and transferred to a nylon membrane. 
The membrane was then probed with an internal fragment of pCal. In the gel-blot 
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shown above, the extrachromosomal pCal forms appear as a band running at about 
6.5 kb and a smear of shorter forms running between 3 and 6.5 kb. The integrated 
chromosomal copies of TCa2 appear as a band at >20 kb. 

5 Figure 8 shows that TCa2 RNA expression occurs in a similar pattern to the 
expression of pCal DNA. Total RNA was isolated from cultures of the seven C. 
albicans strains, grown at 27°C or 37°C, as for the DNA in Fig. 1. Approximately 
equal amounts of RNA from each culture were then separated on agarose gels, 
transferred to nylon membranes and probed with the pCal internal probe. With longer 
10 exposures, TCa2 RNA could be detected in all of the strains. 

Figure 9 shows the comparison of the 5' regions of TCa2 retrotransposons from the 
various strains. The first -400 bp of TCa2 retrotransposons from each of the seven 
strains, except hOG1042, were amplified by PCR and cloned into a plasmid vector. 

1 5 The inserts of two clones from each strain were then sequenced and the sequences 
are compared above. The clones are labelled according to the strain they were derived 
from, for example, the first clone from ATCC10261 is ATC-1, the second clone from 
SC5314 is SC5-2 etc. Also shown are the sequences of p30 and p36, two of the 
original clones of pCal from hOG1042. The 5 1 half of the published pCal sequence 

20 was derived from p36. The sequences of the clones are listed in order corresponding 
to the amount of TCa2 RNA produced by the host strain, i.e. SGY269 produces the 
least and hOG1042 the most. The 6 bp inverted repeats at the ends of the LTRs are 
overlined. 

25 Figure 10 shows the possible secondary structure of the minus-strand priming 
complex. The sequence of clone p759-2 is shown as it might appear bound to the C. 
albicans tRNA Ar 9< ucu ) fragment. The PBS of this clone is a perfect 32 bp match to 
the tRNA fragment. The remainder of the 5' untranslated region has the potential to 
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form a stem-loop structure. The nucleotides of the retrotransposon from within the 
LTR are underlined. The AUG codon at the start of the gag ORF is shown in boldface. 

Figure 1 1 (A) shows the location of the TCa2 probes and some important restriction 
5 sites. The structure of TCa2 is represented as the long box and the LTRs are the 
boxed triangles. The locations of the LTR and internal probes and certain restriction 
sites are indicated. P, Pst\; C, C/al; A, AsplIB; E, fcoRI. (B) Copy number of TCa2. 
DNA was isolated from cells grown at 27 C then digested withfcoRI. The resulting 
fragments were separated on an agarose gel then transferred to a nylon membrane. 

10 The DNA immobilized on the membrane was then hybridized to the TCa2 internal 
probe. Lane 1, hOG1042; lane 2, SGY269; lane 3, SC5314; lane 4, ATCC10261; 
lane 5, SA40; lane 6, F16932; lane 7, C. ma/tosa; lane 8, C. parapsilosis; lane 9, C. 
tropicalis; lane 10, C. pseudotropicalis. Sizes in kb are indicated at the left of the 
picture. (C) Copy number of the TCa2 LTR. The membrane used in panel B was 

1 5 stripped and then reprobed with the TCa2 LTR. 

Figure 12 shows the determination of TCa2 copy number in hOG759 and hOG1042. 
High molecular weight chromosomal DNA from each of the strains was purified away 
from the extrachromosomal copies of pCal as described in Materials and methods and 
20 then subjected to Southern analysis using the pCal internal probe. The DNA was 
digested with Pst\ (lanes 1 and 2), EcoR\ (lanes 3 and 4) or Cla\ (lanes 5 and 6). Lanes 
1, 3, and 5, hOG759; lanes 2, 4, 6, hOG1042. Sizes in kb are indicated to the left. 

Figure 13 shows the plasmid pRPU3. The CaARS from pCARS (originally the Sphl 
25 fragment from pRC231 2) was ligated in as a Hind\\\IBamH\ fragment into pRPU2. 

Figure 14 shows the plasmid pTIM1/2. Using CAL1 and CAL2 primers on p36 
template the Sacl/Xbal products were cloned into p36K (creating p36Kf1) and then 
into pUXLC (creating pTIM1/p36flUXLC) and pUXILC (creating pTIM2/p36flUX1LC). 

16 
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Figure 1 5 shows a Southern analysis of the TCa2 probe; 



Lane 1 hOG759 
Lane 2 hOG1042 
Lane 3 hOG759 
Lane 4 hOG1042 
Lane 5 hOG759 
Lane 6 hOG1042 



Pst1 cut TCa2 probe 
Pst1 cut TCa2 probe 

EcoR1 cut TCa2 probe 
EcoR1 cut TCa2 probe 

Cla1 cutTCa2 probe 
Cla1 cutTCa2 probe. 



1 0 Figure 1 6 shows the generation of additional bands hybridising to TCa2 after culture. 
hOG1042 was grown for approximately 30 days in rich medium at 37°C by 
V continually transferring cells between flasks. Nine independent colonies were isolated 

a p from the final passage. Genomic DNA was isolated from each of these colonies, and 

pj also from hOG1042 and hOG762 (a precursor of hOG1042). EcoRI-digested samples 

if! 1 5 of DNA from each strain were then subjected to Southern blotting using as a probe a 
5 2 kb fra 9 ment of TCa2 corresponding to the reverse transcriptase coding sequence. 

^ The results are shown in the figure. Lanes: 1, hOG762; 2, hOG1042, 3 to 11, 9 

D independent strains derived from hOG1042 after growth for approximately 30 days at 

sjf 37°C. In several of the strains which had been subjected to passage at 37° TCa2 

%Q 20 hybridised to more bands than in the parent hOG1042 7 for instance additional high 
w molecular weight bands can be seen in lanes 4, 6 and 1 1 . Additional bands of 

various sizes were also visible in these and other strains when the DNA was digested 
with other enzymes (not shown). Gain of bands was never found to be associated 
with the loss of any of the original bands, suggesting that the new bands represent 
25 additional copies of TCa2. 



Figure 17 shows the nucleic acid sequence of retrotransposon 1 of 1309 base pairs. 
Figure 18 shows the nucleic acid sequence of retrotransposon 2. 
Figure 19 shows the nucleic acid sequence of retrotransposon 3. 

17 
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Figure 20 shows the nucleic acid sequence of retrotransposon 4. 
Figure 21 shows the nucleic acid sequence of retrotransposon 5. 
Figure 22 shows the nucleic acid sequence of retrotransposon 6. 
Figure 23 shows the nucleic acid sequence of retrotransposon 7. 
5 Figure 24 shows the nucleic acid sequence of retrotransposon 8. 

Figure 25 shows the amino acid sequence of the pol protein of retrotransposon 8. 
Figure 26 shows the nucleic acid sequence of retrotransposon 9. This has a TCa2- 
like LTR. 

Figure 27 shows the nucleic acid sequence of retrotransposon 10. This has a TCa2- 
10 like LTR. 

Figure 28 shows the nucleic acid sequence of retrotransposon 1 1 . This also has a 
TCa2-like LTR. 

Figure 29 shows the DNA sequence of retrotransposon 12. This also has a TCa2-like 
LTR. 

15 Figure 30 is the nucleic acid sequence of retrotransposon 13. This also has a TCa2- 
like LTR. 

Figure 31 shows the nucleic acid sequence of retrotransposon 14. The pol protein is 
from nucleic acids 1 169-1839. 

Figure 32 shows the nucleic acid sequence of retrotransposon 15. The pol protein is 
20 from 1555-4302 base pairs. The LTR regions are from 979-1292 and 5212-5525 
base pairs. 

Figure 33 shows the amino acid sequence of retrotransposon 15. The pol protein is 
from 916 amino acids. 

Figure 34 shows the nucleic acid sequence of retrotransposon 16. The pol protein is 
25 from 309-2332 base pairs. 

Figure 35 shows the amino acid sequence of retrotransposon 16. The pol protein is 
748 amino acids. 

Figure 36 shows the DNA sequence of retrotransposon 17. The LTR zeta is from 
887-1394 base pairs. 
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Figure 37 shows the nucleic 

from 1418-1926 base pairs. 

Figure 38 shows the nucleic 

from 767-1274 base pairs. 
5 Figure 39 shows the nucleic 

from 3344-3851 base pairs. 

Figure 40 shows the nucleic 

from 812-1319 base pairs. 

Figure 41 shows the nucleic 
1 0 from 672-1 1 79 base pairs. 

Figure 42 shows the nucleic 

from 467-974 base pairs. 

Figure 43 shows the nucleic 

from 787-1294 base pairs. 
1 5 Figure 44 shows the nucleic acid sequence of retrotransposon 25. 

Figure 45 shows the nucleic acid sequence of retrotransposon 26. The pol protein is 

from 2-322 base pairs. The LTR san is from 390-377 base pairs. 

Figure 46 shows the amino acid sequence of retrotransposon 26. The pol protein of 

106 amino acids. 

20 Figure 47 shows the nucleic acid sequence of retrotransposon 27. The LTR san is 
from 143-523 base pairs. 

Figure 48 shows the nucleic acid sequence of retrotransposon 28. The LTR san is 
from 558-939 base pairs. 

Figure 49 shows the outline of the construction of the plasmid pRPU3. Plasmids from 
25 which DNA was derived from in this work are accompanied by a circle. The 
rectangular boxes indicate PCR products. 

Figure 50 shows the construction of pTIM2 and p36f4UX1LC. These plasmids 
contain a yeast autonomously replicating sequence (CARS) and the C. albicans URA3 
gene. In both plasmids the URA3 gene uses the promoter in the left LTR and relies on 
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acid sequence of retrotransposon 18. The LTR zeta is 

acid sequence of retrotransposon 19. The LTR zeta is 

acid sequence of retrotransposon 20. The LTR zeta is 

acid sequence of retrotransposon 21. The LTR zeta is 

acid sequence of retrotransposon 22. The LTR zeta is 

acid sequence of retrotransposon 23. The LTR zeta is 

acid sequence of retrotransposon 24. The LTR zeta is 
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the transcription termination signals in the right LTR. P36f4UX1LC also contains the 
gag ORF of pCAL as a fusion product with the URA3 gene. The rectangular boxes 
represent PCR products and the circles the original plasmids from which DNA was 
obtained. 

5 Figure 51 shows the outline of the construction of the plasmid pNRE5 used in an in 
vivo construction in the C. maltosa strain CHAU1. 
Figure 52 shows the results of transformed colonies per \ig DNA. 
Figure 53 shows the expression of pCal DNA occurs in a temperature- and strain- 
dependent manner. Cultures of the seven indicated C. albicans strains were grown at 

10 27 C and 37 C to late log/early stationary phase following which total DNA was 
isolated. Approximately equal amounts of undigested DNA samples from each culture 
were then electrophoresed on an agarose gel and transferred to a nylon membrane. 
The membrane was then probed with an internal fragment of pCal. In the gel-blot 
shown above, the extrachromosomal pCal forms appear as a band running at about 

15 6.5 kb and a smear of shorter forms running between 3 and 6.5 kb. The integrated 
chromosomal copies of TCa2 appear as a band at >20 kb. 

Figure 54 shows TCa2 RNA expression occurs in a similar pattern to the expression 
of pCal DNA. Total RNA was isolated from cultures of the seven C. albicans strains, 
grown at 27 C or 37 C, as for the DNA in Fig. 1. Approximately equal amounts of 
20 RNA from each culture were then separated on agarose gels, transferred to nylon 
membranes and probed with the pCal internal probe. With longer exposures, TCa2 
RNA could be detected in all of the strains. 

Figure 55 is a Southern analysis of URA3* colonies derived from two Candida strains, 
hOG1051 and hOG963. Genomic DNA from URA3* colonies and their parental 
25 strains was digested with Eco RV and probed with the URA3* gene (shown in the 
schematic diagram). 

Figure 56 shows ABI PRISM chromatogram H963RU59; that is, sequence surrounding 
a TCa2/URA3 element integrated into a new position in the Candida genome. 
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Position 291 shows the start codon of an ORF corresponding to a probable membrane 
protein. Position 276 represents the insertion site of TCa2/URA3, within the ORF. 
Figure 57 is a summary of the integration sites of TCa2/URA3 and the sequences 
around the integration sites. 
5 Figure 58 is an ORF map of contig 4-2824 and shows the integration site in 
H963RU59 (URA*). 

Figure 59 shows an analysis of intron processing from the ura3 gene. The URA3 gene 
was placed into TCa2 in all possible combinations. The vector was then transformed 
into C. albicans CAI-4 and URA3* transformants were selected. Constructs, which 
10 gave rise to URA3 + colonies, are indicated. 

Figure 60 shows the Integration of pRUIA. Integration of pRUIA results in the 
formation of a functional ADE2 gene. 

Figure 61 shows a Southern analysis of pRUIA integrated into hOG1051 and 
hOG963. Southern analysis was performed using a URA3 probe, shown in the 
15 schematic diagram. Genomic DNA has been digested with Eco Rl (E), Hind III (H) or 
Xba I (X). H1051R appears to contain two copies of pRUIA. 

Figure 62 shows a Northern analysis of tagged TCa2. RNA was isolated from cultures 
grown at 27 C and 37 C. NJRA3 gene probe was used in this analysis. The arrow 
indicates the transcript containing the tagged TCa2 (approximately 7kb). 
20 Figure 63 shows a tagged retrotransposition. 

Figure 64 shows the production of URA3 + colonies. Approximately 10 7 cells were 
plated on each of the four plates. Only strains containing pRUIA give rise to URA3 + 
colonies. 

Figure 65 shows a Southern analysis of URA3 + colonies. Genomic DNA from URA3 + 
25 colonies and their parental strains was digested with Eco RV and probed with the 
URA3 + gene (shown in the schematic diagram). 

Figure 66 shows the general principle of inverse PCR as applied in this analysis. The 
agarose gel shows the result of inverse PCR on 10 independent tagged 
retrotransposition events. 

30 Figure 67 shows ORF maps of tagged retrotransposition events. The arrow at the 

integration site indicates the direction of the TCa2 element. Tentative annotations of 

ORF have been made. Only the ORFs closest to the insertion site are shown. 

Figure 68 shows the distribution of TCa2 insertions in relation to the nearest ORF. 
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Figure 69 is an analysis of the sequence around the insertion site. All sequences are 

shown in the same orientation with respect to the integrated TCa2. 

Figure 70 shows the removal of an integrated retrotransposon. Recombination 

between LTR sequences results in the loss of the URA3 gene. The result of this 

5 recombination is a solo LTR. 

Figure 71 shows the nucleotide sequences of a further 38 retrotransposons. 

Figure 72 is an overview table of the additional 38 sequences. 
DETAILED DESCRIPTION OF THE INVENTION 

10 

Retrotransposons have many uses. Retrotransposons can be used as vectors for 

expression - either in vivo or in vitro of exogenous nucleic acid molecules. 

Retrotransposons thus can also be used for immunological, immunogenic or vaccine 
iU compositions, as well as for therapeutic compositions. Further, retrotransposons can 

15 be used for eliciting an immunological or immunogenic or protective immunological 
yl (vaccine) response, as well as a therapeutic response. Retrotransposons can be used 

for gene insertion and expression studies in cell culture, gene therapy, for the 
M generation of transgenic animals, and in where traditional RNA retroviral vectors may 

Ifi be used (as well as in instances where such RNA retroviral vectors theoretically may 

%U 20 be employed but may be considered unsafe or undesirable). 

For instance, reference is made to: Gilbert et al., Biol Chem 380(3) :299-303 (March 
1999), Plebanski et al. Eur J Immunol 28(12):4345-55 (Dec. 1998), Garcia-Valcarcel 
et al. Vaccine 15(6-7):709-10 (Apr-May 1997), Poggeler et al. Biochem Biophys Res 

25 Commun 219(3):890-9 (Feb 1996); Kingsman et al. Ann NY Acad Sci 754:202-13 
(May 1995); Adams et al. Mol Biotechnol 1(2):1 25-35 (Apr 1994); Adams et al. Int. 
Rev. Immunol 11(2):133-41 (1994); Kingsman et al. Trends Biotechnol 9(9):303-9 
(Sep 1991); Cook et al. Biotechnology 9(8):748-51 (Aug 1991); Kingsman et al. 
Vaccine 6(4):304-6 (1988); Malim et al. Nucleic Acids Res 1 5(18):7571-80 (1987); 

30 WO88/03169; WO92/07950; WO94/20608; and U.S. Patents Nos. 5,041,385, 
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5,354,674, 5,879,933, 5,969,126, 5,925,565, 5,885,971, 5,916,804, and 
5,292,662 relate to retrotransposons and uses thereof, such as in introducing 
nucleotide sequences or nucleic acid molecules of interest into certain cells 
(expression systems, e.g. 72-kDa mitochondrial polypeptide), gene transfer, position- 
5 specific insertion vectors, vaccines (or immunological or immunogenic or therapeutic 
compositions; in vivo presentation of antigen or therapeutic or antigen or therapeutic 
delivery systems such as for antigens from Plasmodium, varicella zoster, HIV 
antigens, other viral antigens or for therapeutics such as interferon), purification or 
presentation or targeting vehicles, and in carriers or adjuvants, and the like. Indeed, 
10 these documents demonstrate that retrotransposons "can be administered safely in 
humans" (Plebanski et al., supra). Inventive nucleic acid molecules (DNA, RNA), 
amino acids, and retrotransposons can be used in the same fashion as previous 
retrotransposons; and thus, can be formulated and used in the fashion that 
retrotransposons are formulated in herein cited documents. 

15 

Thus, for instance, retrotransposons of the invention can be used to express nucleic 
acid molecules and can be formulated in compositions such as immunogenic, 
immunological or vaccine compositions. An immunological composition elicits an 
immunological response - local or systemic. The response can, but need not be, 
20 protective. An immunogenic composition likewise elicits a local or systemic 
immunological response which can, but need not be, protective. A vaccine 
composition elicits a local or systemic protective response. Accordingly, the terms 
"immunological composition" and "immunogenic composition" include a "vaccine 
composition" (as the two former terms can be protective compositions). 

25 

With respect to nucleic acid molecules and polypeptides of the invention, the nucleic 
acid molecules and polypeptides advantageously have at least about 65% or greater 
homology or identity or similarity with herein disclosed sequences, e.g., at least 70%, 
such as at least 75%, or at least 80% or advantageously at least 85%, for instance 
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at least 90%, such as at least 95% or even 97% or 100%, similarity or homology or 
identity with herein disclosed sequences, such as (a) the LTR and/or POL region of 
Fig. 2B, or (b) the sequence illustrated in Fig. 2B, or (c) a nucleic acid sequence 
positioned between at least two long terminal repeats of the sequence of pCal as in 
5 GenBank accession number AF007776, or (d) a LTR and/or POL region of (c), or (e) 
any of sequences 1-28, or (f) any of retrotransposons 1-28, or (g) a sequence which 
hybridizes under standard stringent conditions to any of (a)-(f), or (h) a functional 
fragment of any of (a)-(g) (including subsequences discussed below). 

10 Nucleotide sequence homology or identity or similarity can be determined using the 
"Align" program of Myers and Miller, ("Optimal Alignments in Linear Space", CABIOS 
4, 11-17, 1988, incorporated herein by reference) and available at NCBI. 
Alternatively or additionally, the term "homology" or "identity", for instance, with 
respect to a nucleotide or amino acid sequence, can indicate a quantitative measure 

15 of homology between two sequences. The percent sequence homology can be 
calculated as (Href - ISM * 1 00/ISU* , wherein Hdif is the total number of non-identical 
residues in the two sequences when aligned and wherein Href is the number of 
residues in one of the sequences. Hence, the DNA sequence AGTCAGTC will have a 
sequence similarity of 75% with the sequence AATCAATC (Href = 8; Hdif =2). 

20 

Alternatively or additionally, "homology" or "identity" with respect to sequences can 
refer to the number of positions with identical nucleotides or amino acids divided by 
the number of nucleotides or amino acids in the shorter of the two sequences wherein 
alignment of the two sequences can be determined in accordance with the Wilbur and 
25 Lipman algorithm (Wilbur and Lipman, 1983 PNAS USA 80:726, incorporated herein 
by reference), for instance, using a window size of 20 nucleotides, a word length of 4 
nucleotides, and a gap penalty of 4, and computer-assisted analysis and interpretation 
of the sequence data including alignment can be conveniently performed using 
commercially available programs (e.g., Intelligenetics ™ Suite, Intelligenetics Inc. CA). 
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When RNA sequences are said to be similar, or have a degree of sequence identity or 
homology with DNA sequences, thymidine (T) in the DNA sequence is considered 
equal to uracil (U) in the RNA sequence. RNA sequences within the scope of the 
invention can be derived from DNA sequences, by thymidine (T) in the DNA sequence 
5 being considered equal to uracil (U) in RNA sequences. 

Additionally or alternatively, nucleotide and/or amino acid sequence similarity or 
identity or homology can be determined using the BlastP program (Altschul et aL, 
Nucl. Acids Res. 25, 3389-3402, incorporated herein by reference) and available at 

10 NCBI. The following references (each incorporated herein by reference) also provide 
algorithms for comparing the relative identity or homology or similarity of amino acid 
residues of two proteins, and additionally or alternatively with respect to the 
foregoing, the teachings in these references can be used for determining percent 
homology or identity: Needleman SB and Wunsch CD, "A general method applicable 

15 to the search for similarities in the amino acid sequences of two proteins," J. Mol. 
Biol. 48:444-453 (1970); Smith TF and Waterman MS, "Comparison of Bio- 
sequences," Advances in Applied Mathematics 2:482-489 (1981); Smith TF, 
Waterman MS and Sadler JR, "Statistical characterization of nucleic acid sequence 
functional domains," Nucleic Acids Res., 1 1:2205-2220 (1983); Feng DF and Dolittle 

20 RF, "Progressive sequence alignment as a prerequisite to correct phylogenetic trees," 
J. of Molec. Evol., 25:351-360 (1987); Higgins DG and Sharp PM, "Fast and 
sensitive multiple sequence alignment on a microcomputer," CABIOS, 5: 151-153 
(1989); Thompson JD, Higgins DG and Gibson TJ, "ClusterW: improving the 
sensitivity of progressive multiple sequence alignment through sequence weighing, 

25 positions-specific gap penalties and weight matrix choice, Nucleic Acid Res., 
22:4673-480 (1994); and, Devereux J, Haeberlie P and Smithies O, "A 
comprehensive set of sequence analysis program for the VAX," Nucl. Acids Res., 12: 
387-395 (1984). 
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Furthermore, as to inventive nucleic acid molecules, the invention comprehends codon 
equivalent nucleic acid molecules. For instance, if the invention comprehends "X" 
protein having amino acid sequence "A" and nucleic acid molecule "N" encoding 
protein X, the invention comprehends nucleic acid molecules that also encode protein 
5 X via one or more different codons than in nucleic acid molecule N. 

In addition, as to inventive nucleic acid molecules, the invention comprehends nucleic 
acid molecules that hybridize under stringent conditions to herein disclosed nucleic 
acid molecules. 

10 

As to herein disclosed amino acid sequences, the invention comprehends nucleic acid 
molecules encoding the herein disclosed amino acid sequences, as well as nucleic acid 
molecules that hybridize under stringent conditions to nucleic acid molecules encoding 
herein disclosed amino acid sequences, as these nucleic acid molecules that hybridize 
1 5 under stringent conditions to nucleic acid molecules encoding herein disclosed amino 
acid sequences can provide proteins having similarity, homology or identity as herein 
discussed. 

The disclosed nucleic acid sequences or portions or fragments thereof, e.g., 
20 subsequences comprising at least about 12 nucleotides in length, for instance, at 
least about 15, about 18, about 21, about 24 or about 27 nucleotides in length, such 
as at least about 30, about 33, about 36, about 39 or about 42 nucleotides in length, 
for example, a nucleic acid molecule of at least about 1 2 nucleotides in length such as 
about 1 2 to about 30, about 1 2 to about 50 or about 1 2 to about 60, or about 1 2 to 
25 about 75 or about 12 to about 100 or more nucleotides in length may be useful in 
hybridization, e.g., as probes or primers; for instance, to detect the presence or 
absence of Candida albicans in a sample or to determine the presence or absence of 
retrotransposons of the invention in a sample (amplification or detection of Candida 
albicans and/or inventive retrotransposons). The diagnostic applicability of nucleic 
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acid molecules of the invention is a very real world use of the inventive nucleic acid 
molecules. 

Further, the invention comprehends use of nucleic acid molecules and/or 
5 retrotransposons as vectors e.g., containing and/or expressing such an exogenous or 
heterologous (as to Candida albicans or as to the cell) or homologous (e.g., as to an 
organism or animal or cell) nucleic acid molecule, e.g., the use of a recombinant 
retrotransposon of the invention as a vector for delivery of a nucleic acid molecule 
that is exogenous or heterologous or even homologous to a cell, organism or animal, 
10 for instance, to elicit an immunogenic, immunological or protective immune response 
(e.g., from expression of an exogenous or heterologous nucleic acid molecule 
-f encoding an epitope of interest or an antigen) or as a therapeutic (e.g., to express a 

j: homologous nucleic acid molecule such as interferon or a gene that may need to be 

z£ expressed in a particular individual). 

in 15 

It Even further still, the invention comprehends use of the retrotransposons to contain 

s and/or express a nucleic acid molecule deleterious to Candida albicans, e.g., so that 

?g- the retrotransposon can become integrated into the Candida albicans genome and be 

?! S 

lethal to Candida albicans; for instance, as a form of treatment against Candida 
,g 20 albicans. The therapeutic, immunogenic, immunological or vaccine compositions can 
* u contain the retrotransposon in amounts and in carriers or vehicles analogous to those 

employed in herein cited documents. 

The nucleic acids used for hybridization can, of course, be conveniently labelled by 
25 incorporating or attaching a marker, e.g., a radioactive or other marker. Such markers 
are well known in the art. The labelling of said nucleic acid molecules can be effected 
by conventional methods. The presence or expression of Candida albicans or of 
retrotransposons thereof (such as inventive retrotransposons) can be monitored by 
using a primer pair that specifically hybridizes and by carrying out a PCR reaction 
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according to standard procedures. Specific hybridization of the above mentioned 
probes or primers preferably occurs at stringent hybridization conditions. A probe or 
primer can be any stretch of at least 8, preferably at least 10, more preferably at least 
12, 13, 14, or 15, such as at least 20, e.g., at least 23 or 25, for instance at least 
5 27 or 30 nucleotides in a herein defined nucleic acid molecule which are unique 
thereto. As to PCR or hybridization primers or probes and optimal lengths therefor, 
reference is also made to Kajimura et al., GATA 7(4):71-79 (1990), incorporated 
herein by reference. 

10 with respect to hybridization, it is advantageously under high stringency conditions; 
and, hybridizing or hybridization under high stringency conditions can be synonymous 
with stringent hybridization conditions, terms which are well known in the art; see, 
for example, Sambrook, "Molecular Cloning, A Laboratory Manual" second ed., CSH 
Press, Cold Spring Harbor, 1989; "Nucleic Acid Hybridisation, A Practical Approach", 

15 Hames and Higgins eds., IRL Press, Oxford, 1985; both incorporated herein by 
reference. 

With respect to therapeutic, immunogenic, immunological and vaccine formulations, in 
addition and/or as an alternative to employing compositions and amounts of 

20 retrotransposon and routes of administration as in herein cited documents, it is noted 
that in classical formulations, e.g., classical immunogenic, immunological or vaccine 
or therapeutic formulations containing an antigen or epitope of interest (e.g., subunit 
formulations) or containing a biologically active therapeutic, typically contain the 
active ingredient in in an amount on the order of micrograms to milligrams, such as 5 

25 micrograms to 500 milligrams, or, about 0.001 to about 20 wt%, preferably about 
0.01 to about 10 wt%, and most preferably about 0.05 to about 5 wt%; and, in 
compositions involving a recombinant such as a recombinant viral vector expressing 
an antigen, epitope of interest or biologically active molecule, the vector is 
administered rn an amount of about at least 10 3 pfu; more preferably about 10 4 pfu to 
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about 10 10 pfu, e.g., about 10 5 pfu to about 10 9 pfu, for instance about 10 6 pfu to 
about 10 s pfu; and, in DNA plasmid compositions, suitable quantities of plasmid DNA 
such compositions can be 1 ug to 100 mg, preferably 0.1 to 10 mg, e.g., 500 
micrograms, but lower levels such as 0.1 to 2 mg or preferably 1-10 ug may be 
5 employed. Accordingly, the recombinant retrotransposons of the invention can be 
administered in dosages sufficient to elicit a response analogous to compositions 
wherein the antigen, epitope of interest or biologically active molecule are directly 
present; or to have expression analogous to dosages in such compositions; or to have 
expression analogous to expression obtained in vivo by recombinant viral or DNA 
10 plasmid compositions. 

Of course, for any composition to be administered to an animal or human, including 
the components thereof, and for any particular method of administration, it is 
preferred to determine therefor: toxicity, such as by determining the lethal dose (LD) 

15 and LDso in a suitable animal model e.g., rodent such as mouse; and, the dosage of 
the composition(s), concentration of components therein and timing of administering 
the composition(s), which elicit a suitable response, e.g., a suitable immunological or 
therapeutic response, such as by titrations of sera and analysis thereof, e.g., for 
antibodies or antigens or epitopes of interest or the therapeutic molecule. Such 

20 determinations do not require undue experimentation from the knowledge of the 
skilled artisan, this disclosure and the documents cited herein. And, the time for 
sequential administrations can be ascertained without undue experimentation using 
similar analyses. Thus, the amount of retrotransposon in the inventive compositions 
and the dosages administered can be determined by techniques well known to those 

25 skilled in the medical or veterinary arts and taking into consideration such factors as 
the particular antigen, eptitope of interest or therapeutic being expressed, the carrier, 
or diluent, any adjuvant (if present), the age, sex, weight, species and condition of 
the particular patient, and the route of administration. 



29 



tjk0879 



PATENT 
674521-2001.1 

Examples of compositions of the invention include liquid preparations for orifice, e.g., 
oral, nasal, anal, vaginal, peroral, intragastric, mucosal (e.g., perlingual, alveolar, 
gingival, olfactory or respiratory mucosa) etc., administration such as suspensions, 
syrups or elixirs; and, preparations for parenteral, subcutaneous, intradermal, 
5 intramuscular or intravenous administration (e.g., injectable administration), such as 
sterile suspensions or emulsions. Such compositions may be in admixture with a 
suitable carrier, diluent, or excipient such as sterile water, physiological saline, 
glucose or the like. The compositions can also be lyophilized. The compositions can 
contain auxiliary substances such as wetting or emulsifying agents, pH buffering 
10 agents, gelling or viscosity enhancing additives, preservatives, flavoring agents, 
colors, and the like, depending upon the route of administration and the preparation 
desired. Standard texts, such as "REMINGTONS PHARMACEUTICAL SCIENCE", 
17th edition, 1985, incorporated herein by reference, may be consulted to prepare 
suitable preparations, without undue experimentation. 

15 

Compositions of the invention, are conveniently provided as liquid preparations, e.g., 
isotonic aqueous solutions, suspensions, emulsions or viscous compositions which 
may be buffered to a selected pH. If digestive tract absorption is preferred, 
compositions of the invention can be in the "solid" form of pills, tablets, capsules, 

20 caplets and the like, including "solid" preparations which are time-released or which 
have a liquid filling, e.g., gelatin covered liquid, whereby the gelatin is dissolved in the 
stomach for delivery to the gut. If nasal or respiratory (mucosal) administration is 
desired, compositions may be in a form and dispensed by a squeeze spray dispenser, 
pump dispenser or aerosol dispenser. Aerosols are usually under pressure by means 

25 of a hydrocarbon. Pump dispensers can preferably dispense a metered dose or, a 
dose having a particular particle size. 

Compositions of the invention can contain pharmaceutically acceptable flavors and/or 
colors for rendering them more appealing, especially if they are administered orally. 
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The viscous compositions may be in the form of gels, lotions, ointments, creams and 
the like and will typically contain a sufficient amount of a thickening agent so that the 
viscosity is from about 2500 to 6500 cps, although more viscous compositions, even 
up to 10,000 cps may be employed. Viscous compositions have a viscosity 
5 preferably of 2500 to 5000 cps, since above that range they become more difficult to 
administer. However, above that range, the compositions can approach solid or 
gelatin forms which are then easily administered as a swallowed pill for oral ingestion. 

Liquid preparations are normally easier to prepare than gels, other viscous 
10 compositions, and solid compositions. Additionally, liquid compositions are 
somewhat more convenient to administer, especially by injection or orally, to animals, 
children, particularly small children, and others who may have difficulty swallowing a 
pill, tablet, capsule or the like, or in multi-dose situations. Viscous compositions, on 
the other hand, can be formulated within the appropriate viscosity range to provide 
15 longer contact periods with mucosa, such as the lining of the stomach or nasal 
mucosa. 

Obviously, the choice of suitable carriers and other additives will depend on the exact 
route of administration and the nature of the particular dosage form, e.g., liquid 
20 dosage form (e.g., whether the composition is to be formulated into a solution, a 
suspension, gel or another liquid form), or solid dosage form (e.g., whether the 
composition is to be formulated into a pill, tablet, capsule, caplet, time release form or 
liquid-filled form). 

25 Solutions, suspensions and gels, normally contain a major amount of water (preferably 
purified water) in addition to the retrotransposon. Minor amounts of other ingredients 
such as pH adjusters (e.g., a base such as NaOH), emulsifiers or dispersing agents, 
buffering agents, preservatives, wetting agents, jelling agents, (e.g., methylcellulose), 
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colors and/or flavors may also be present. The compositions can be isotonic, i.e., it 
can have the same osmotic pressure as blood and lacrimal fluid. 

The desired isotonicity of the compositions of this invention may be accomplished 
5 using sodium chloride, or other pharmaceutical^ acceptable agents such as dextrose, 
boric acid, sodium tartrate, propylene glycol or other inorganic or organic solutes. 
Sodium chloride is preferred particularly for buffers containing sodium ions. 

Viscosity of the compositions may be maintained at the selected level using a 
10 pharmaceutical^ acceptable thickening agent. Methylcellulose is preferred because it 
is readily and economically available and is easy to work with. Other suitable 
thickening agents include, for example, xanthan gum, carboxymethyl cellulose, 
hydroxypropyl cellulose, carbomer, and the like. The preferred concentration of the 
thickener will depend upon the agent selected. The important point is to use an 
1 5 amount which will achieve the selected viscosity. Viscous compositions are normally 
prepared from solutions by the addition of such thickening agents. 

A pharmaceutical^ acceptable preservative can be employed to increase the shelf-life 
of the compositions. Benzyl alcohol may be suitable, although a variety of 
20 preservatives including, for example, parabens, thimerosal, chlorobutanol, or 
benzalkonium chloride may also be employed. A suitable concentration of the 
preservative will be from 0.02% to 2% based on the total weight although there may 
be appreciable variation depending upon the agent selected. 

25 Those skilled in the art will recognize that the components of the compositions must 
be selected to be chemically inert with respect to the retrotransposon. This will 
present no problem to those skilled in chemical and pharmaceutical principles, or 
problems can be readily avoided by reference to standard texts or by simple 
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experiments (not involving undue experimentation), from this disclosure and the 
documents cited herein. 



The compositions of this invention are prepared by mixing the ingredients following 
5 generally accepted procedures. For example the selected components may be simply 
mixed in a blender, or other standard device to produce a concentrated mixture which 
may then be adjusted to the final concentration and viscosity by the addition of water 
or thickening agent and possibly a buffer to control pH or an additional solute to 
control tonicity. Generally the pH may be from about 3 to 7.5. Compositions can be 

10 administered in dosages and by techniques well known to those skilled in the medical 
and veterinary arts taking into consideration such factors as the age, sex, weight, and 
condition of the particular patient or animal, and the composition form used for 
administration (e.g., solid vs. liquid). Dosages for humans or other mammals can be 
determined without undue experimentation by the skilled artisan, from this disclosure, 

1 5 the documents cited herein, the Examples below 

The inventive retrotransposons can contain and preferably express at least one nucleic 
acid molecule encoding an antigen or epitope of interest. An epitope of interest is an 
immunologically relevant region of an antigen or immunogen or immunologically active 

20 fragment thereof, e.g., from a pathogen or toxin of veterinary or human interest. An 
epitope of interest can be from an antigen of a pathogen or toxin, or from another 
antigen or toxin which elicits a response with respect to the pathogen or toxin, e.g., 
from an antigen of a first human or veterinary pathogen or toxin that elicits a 
response with respect to the pathogen or toxin in question (such as a measles virus 

25 antigen or epitope of interest eliciting an immunological response against canine 
distemper). Thus, for instance, an epitope of interest can be from: a Morbillivirus 
antigen, e.g., a canine distemper virus or measles or rinderpest antigen such as HA or 
F; a rabies glycoprotein, e.g., rabies glycoprotein G; an avian influenza antigen, e.g., 
turkey influenza HA, Chicken/Pennsylvania/1/83 influenza antigen such as a 
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nucleoprotein (NP) or influenza A/Jalisco/95 H5 hemagglutinin; a human influenza 
antigen such as HA and/or NA; a bovine leukemia virus antigen, e.g., gp51, 30 
envelope; a Newcastle Disease Virus (NDV) antigen, e.g., HN or F; a feline leukemia 
virus antigen (FeLV), e.g., FeLV envelope protein; a rous associated virus antigen 
5 such as RAV-1 env; matrix and/or preplomer of infectious bronchitis virus; a 
Herpesvirus glycoprotein, e.g., a glycoprotein, for instance from feline herpesvirus, 
equine herpesvirus, bovine herpesvirus, pseudorabies virus, canine herpesvirus, HSV, 
Marek's Disease Virus, herpesvirus of turkeys (HVT) or cytomegalovirus; a flavivirus 
antigen, e.g., a Japanese encephalitis virus (JEV) antigen, a Yellow Fever antigen, or 
10 a Dengue virus antigen; a malaria (Plasmodium) antigen, an immunodeficiency virus 
antigen, e.g., a feline immunodeficiency virus (FIV) antigen or a simian 
"!? immunodeficiency virus (SIV) antigen or a human immunodeficiency virus antigen 

*p (HIV) such as gp120, gp160; a parvovirus antigen, e.g., canine parvovirus; an equine 

pj influenza antigen; a poxvirus antigen, e.g., an ectromelia antigen, a canary pox virus 

IH 15 antigen or a fowl pox virus antigen; an infectious bursal disease virus antigen, e.g., 
p VP2, VP3, VP4; a Hepatitis virus antigen, e.g., HBsAg; a Hantaan virus antigen; a C. 

f tetani antigen; a mumps antigen; a pneumococcal antigen, e.g., PspA; a Borrelia 

O antigen, e.g., OspA, OspB, OspC of Borrelia associated with Lyme disease such as 

Borrelia burgdorferi, Borrelia afzelli and Borrelia garinir, a chicken pox (varicella zoster) 
v3 20 antigen. Of course, this is intended as exemplary, as the epitope of interest 
expressed by an inventive retrotransposon can be derived from any antigen of any 
veterinary or human pathogen or toxin; and, the recombinant retrotransposon can 
express express an antigen of any veterinary or human pathogen or toxin. Thus, it is 
envisioned that the inventive recombinant retrotransposon contain at least one nucleic 
25 acid molecule encoding at least one antigen or epitope of interest. 

With respect to DNA encoding epitopes of interest, antigens and/or therapeutics, 
attention is directed to documents cited herein, see, e.g., documents cited supra and 
documents cited infra, for instance: U.S. Patents Nos. 5,174,993 and 5,505,941 
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(e.g., rabies glycoprotein (G), gene, turkey influenza hemagglutinin gene, gp51,30 
envelope gene of bovine leukemia virus, Newcastle Disease Virus (NDV) antigen, 
FeLV envelope gene, RAV-1 env gene, NP (nucleoprotein gene of 
Chicken/Pennsylvania/1/83 influenza virus), matrix and preplomer gene of infectious 
5 bronchitis virus; HSV gD); U.S. Patent No. 5,338,683 (e.g., DNA encoding 
Herpesvirus glycoproteins, inter alia)) U.S. Patents Nos. 5,494,807, 5,756,103, 
5,762,938 and 5,766,599 (e.g., DNA encoding antigens from rabies, Hepatitis B, 
JEV, YF, Dengue, measles, pseudorabies, Epstein-Barr, HSV, HIV, SIV, EHV, BHV, 
HCMV, canine parvovin^s, equine influenza, FeLV, FHV, Hantaan, C. tetani, avian 

10 influenza, mumps, NDV, inter alia); U.S. Patents Nos. 5,503,834 and 5,759,841 
(e.g., Morbillivirus, e.g., measles F, hemagglutinin, inter alia); U.S. Patent No. 
4,722,848 (e.g., HSV tk, HSV glycoproteins, e.g., gB, gD, influenza HA, Hepatitis B, 
e.g., HBsAg, inter alia); U.S. Patents Nos. 5,514,375, 5,744,140 and 5,744,141 
(e.g., flavivirus structural proteins); U.S. Patents Nos. 5,766,598 and 5,863,542 

15 (e.g., Lentivirus antigens such as immunodeficiency virus antigens, inter alia); U.S. 
Patents Nos. 5,658,572 and 5,641,490 (e.g., IBDV antigens, inter alia); U.S. Patent 
No. 5,833,975 (e.g., cytokine and/or tumor associated antigens, inter alia); U.S. 
Patents Nos. 5,688,920, and 5,529,780 (e.g., canine herpesvirus antigens), PCT 
publication WO 96/3941 (e.g., cytomegalovirus antigens); and U.S. Patents Nos. 

20 5,756,101 and 5,766,597 {Plasmodium antigens). Thus, the skilled artisan can 
obtain DNA or a nucleic acid molecule for including in an inventive retrotransposon, 
without any undue experimentation. 

As to epitopes of interest, one skilled in the art can determine an epitope or 
25 immunodominant region of a peptide or polypeptide and ergo the coding nucleic acid 
molecule or DNA therefor from knowledge in the art, without undue experimentation, 
for instance, from the amino acid of the peptide or polypeptide and corresponding 
nucleic acid molecule or DNA sequences coding for the peptide or polypeptide, as well 
as from the nature of particular amino acids (e.g., size, charge, etc.) and the codon 
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dictionary, inter alia) and, in respect to this, attention directed to documents cited 
herein, including U.S. Patent No. 5,955,089. Accordingly, one skilled in the art can 
obtain an epitope of interest and a nucleic acid molecule coding therefor without any 
undue experimentation. 

5 

Thus, the invention comprehends an immunogenic, immunological, vaccine or 
therapeutic composition comprising an inventive retrotransposon of the invention 
wherein the retrotransposon includes a nucleic acid molecule encoding at least one 
antigen or epitope of interest or therapeutic molecule. The invention further 
10 comprehends a method for inducing an immunological or immune or protective 
immune or therapeutic response comprising administering to a host such as an animal 
or human an inventive retrotransposon of the invention wherein the retrotransposon 
includes a nucleic acid molecule encoding at least one antigen or epitope of interest or 
therapeutic molecule. 

15 

The retrotransposon can have expression in any suitable cell, such as a eukaryotic 
cell; for instance, fungus or yeast cells such as Saccharamyces cerevisiae cells, 
Saccharamyces pastorianus cells, Candida albicans cells, vertebrate cells such as fish 
cells (e.g., shark, salmon, rainbow trout, zebrafish, herring, mackerel cells), amphibian 

20 cells (e.g. frog, toad, salamander cells), bird or avian cells (e.g. chicken, turkey, duck, 
pigeon, dove cells), reptile cells (e.g. snake such as cobra), and mammalian cells (e.g., 
human, rabbit, hamster, mouse, rat, primate, cells such as VERO, HeLa cells, Chinese 
hamster ovary (CHO) cell lines, W138, BHK, COS-7 293, MDCK; invertebrate cells 
such as land invertebrate cells, for instance, insect cells, e.g., lepidopteran cells such 

25 as Spodoptera (e.g., Spodoptera frugiperda, Trichoplusia (e.g., Trichoplusia ni), 
dipteran such as mosquito (e.g. Cuficidae) cells, fly cells (e.g. Drosophila); e.g., typical 
cells that are used with eukaryotic replicable expression vectors such a S. frugiperda 
cells, VERO cells, MRC-5 cells, SCV-1 cells COS-1 cells, NIH3T3 cells, mouse L cells, 
HeLa cells and the like. 
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The invention further comprehends methods for treating Candida albicans comprising 
administering a recombinant retrotransposon of the invention that includes a nucleic 
acid molecule that is lethal or deleterious to Candida albicans, as well as recombinant 
5 retrotransposons that include a nucleic acid molecule that is lethat or deleterious to 
Candida albicans. For instance, a retrotransposon of the invention can disrupt or 
interfere with a gene essential to the viability of Candida albicans] for instance, an 
inventive retrotransposon can disrupt or interfere with CaSNFI (Petter et al. Infect 
Immun 65(1 2):4909-1 7 (1997)) and/or H( + )-ATPase (Perlin et al. Ann NY Acad Sci 

10 834:609-17 (1997)) and/or the Candida albicans 37 kDa polypeptide that appears to 
be a ribosomal protein (Montero et al. Microbiology 144(Pt4):839-47 (1998) and/or a 
Candida albicans topoisomerase gene (Keller et al. Biochem J 324(Pt1):329-39 
(1997) and/or a yeast essential gene (ct Hanes et al. Yeast 5:55-72 (1989); and/or 
an inventive retrotransposon can express a candidacidal antibody (Conti et al. J Infect 

15 Dis 177(3):807-11 (1998)) and/or an antifungal (Ben-Josef et al. J Antibiot (Tokyo) 
50(1 1):937-43 (1997)) and/or an antibody-like molecule (Tournay et al. DNA Cell Biol 
15(8):61 7-24 (1996)). 

Furthermore, in view of the foregoing and the documents cited herein, the invention 
20 comprehends a process for the transfer and expression of at least one gene into a cell 
in vitro or in vivo comprising the steps of: (a) isolating the gene; (b) introducing the 
gene into an inventive retrotransposon (a retrotransposon as herein described); (c) 
introducing said hybrid retrotransposon into a donor cell and allowing the donor cell to 
package and transmit said hybrid retrotransposon into a virion; (d) transferring said 
25 virion to a recipient cell wherein said hybrid retrotransposon replicates by reverse 
transcription and may also be integrated into the recipient cell's genome; (e) 
expressing said hybrid retrotransposon as RNA and/or protein from either at least one 
internal promoter and/or from said retrotransposon long terminal repeat promoter or 
both (or a promoter as herein described); and (f) screening or selecting for the 
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phenotype of said hybrid retrotransposon. The retrotransposon can contain genetic 
material encoding at least one dominant selectable marker; e.g., a selectable marker is 
selected from the group consisting of aminoglycoside phosphotransferase (neo, G418, 
APH), dihydrofolate reductase (DHFR), hygromycin-B-phosphotransferase (HPH), 
5 thymidine kinase (TK), xanthine-guanine phosphoribosyltransferase (XGPRT, gpt), 
chloramphenicol acetyltransferase (CAT) and luciferase. In the process multiple 
cellular movable genetic elements can be introduced and expressed as RNA; for 
instance, the multiple cellular movable genetic elements can be introduced and 
expressed in tandem in RNA; or, the multiple cellular movable genetic elements can 
10 be introduced and expressed as separate transcriptional units within a single cell or 
organism. And, the gene can encode a peptide, antibody, antigen, hormone, or drug 

y not normally expressed in the cell, at biologically significant levels. (Cf. U.S. Patent 

J 5,354,674.) 

Iff 1 5 Similarly, the invention can comprehend polycistronic vector for the expression of one 
Pj or more or a plurality, e.g., at least two or three polynucleotide sequences comprising 

~ a promoter operably linked to a nucleotide sequence comprising elements encoding 

■p one, or two or three, or more proteins, and an inventive retrotransposon or portion 

"jf thereof; the retrotransposon or portion thereof can act as an internal ribosome entry 

20 site. The invention thus further comprehends a method of incorporating a DNA 

•VT. 

w encoding a protein of interest into a cell in vitro comprising transforming said cell with 

this vector. The vector can be a plasmid vector or a viral vector; for instance, a 
vector from a virus selected from the group consisting of poxvirus, adenovirus, 
baculovirus, herpesvirus, adeno-associated virus, and retrovirus. The vector can 
25 include an an encapsidation sequence. A viral particle can comprise the vector. An 
isolated cell can comprise the vector. And, the vector can be in a composition. (Cf. 
U.S. Patent No. 5,925,565.) Likewise, the invention comprehends other methods, 
products, compositions and the like that are analogous to those in documents cited 
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herein, but wherein retrotransposons, nucleic acid molecules, amino acid molecules 
{proteins, polypeptides) and promoters disclosed herein are employed. 

Further, as discussed, the invention can include an immunological, or immunogenic, or 
5 vaccine or therapeutic composition comprisng a carrier or diluent and an inventive 
expression vector wherein the vector expresses an antigen, or an epitope of interest 
or a therapeutic. The composition can be an immunological, immunogenic or vaccine 
composition when the vector expresses an antigen or an epitope of interest (see 
supra). The composition can be a therapeutic composition when the vector expresses 

10 a therapeutic (e.g., interferon, a cytokine, a tumor associated antigen, etc.; see 
supra). And, the invention can include a method for inducing an immunological 
response in a host including an animal (e.g., mammal) or a human comprising 
administering to the host the immunological, immunogenic or vaccine composition; as 
well as a method for inducing a therapeutic response in a host including an animal 

15 (e.g., mammal) or human comprising administering to the host the therapeutic 
composition. As noted in many documents cited herein, an immunological or 
immunogenic response can be useful; for instance, in generating antibodies which are 
themselves useful in diagnostic and other uses. 

20 Accordingly the invention has many embodiments and uses that can be practised 
without undue experimentation from this disclosure and the knowledge in the art, for 
instance as exemplified by documents cited and incorporated herein by reference. 

A better understanding of the present invention and of its many advantages will be 
25 had from the following non-limiting Examples, given as a further description of the 
invention and as illustration of it. 

Plasmids carrying both the retrotransposon and other genetic elements can be 
assembled by in vitro molecular genetic manipulations. Such plasmids should, for 
ease of manipulation, be capable of growing both in E. coli and in yeasts. Such 
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plasmids should carry some suitable marker (such as ADE2) which can be selected for 
following yeast transformation. The presence of such plasmids can be detected and 
selected for following transformation into an Ade-(Adenine auxotrophic) yeast. 
Detection or selection consists of allowing the yeasts to attempt to grow on media 
5 without say adenine. The parental auxotrophic yeast will not grow whereas a 
transformant carrying say a plasmid with the ADE2 gene will grow. The transformed 
culture can be maintained on a medium without adenine and this will select for the 
retention of the plasmid strains carrying the plasmid (maintained by say selection on 
medium without adenine) can be used to perform the various activities described in 
10 this patent. For example they could be plated on a medium which would select for 
integration events (say by selecting for URA3 + ). 

EXAMPLES 

15 

MATERIALS AND METHODS 
Strains and culture conditions 

The isolate iB65, precursor to the Candida albicans strain currently under investigation 
20 (hOG1042), was isolated as a met2 heterozygote from an Otago University 
intermediate biology student in 1983. It was subsequently mutagenised with UV 
radiation (2) and N-methyl-N-nitro-N nitrosoguanidine (Poulter et al 1981) to produce 
five strains - hOG758, hOG759, hOG760, hOG761 and hOG762 - which are all met2 
homozygotes and also auxotrophic for adenine. hOG1042 is an ade2/ade2 
25 MET2/met2 revertant of hOG762. The strains were grown at 27°C or 37°C in YPD 
medium (1 % yeast extract, 2% peptone and 2% glucose). 

Other Candida albicans strains analyzed were F16932 (Poulter, unpublished), SA40 
(Agatensi et al 1991), SC5314 (Gillum et al 1984), and SGY269 (Kelly et al 1987). 
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Other Candida species analyzed were C, pseudotropicalis (CDC B2455), C. tropicalis 
(CDC B397), C. parapsilosis (CDC MCC 499), ail from the National Health Institute, 
Porirua, New Zealand, and C. maltosa (CHAU1). 

5 Enzymes 

Agarase (GELase™) and phosphatase (HK™ Phosphatase) were purchased from 
Epicentre Technologies, USA. T4 DNA ligase, Expand high fidelity PCR system, RNase 
A, and DNase I, Proteinase K, Klenow, and restriction endonucleases were purchased 
from Boehringer Mannheim GmbH, Biochemica, West Germany. Vent® polymerase 
10 was purchased from New England Biolabs, USA. Zymolyase 100T was from 
Seikagaku Corporation, Tokyo. 

Nucleic acid manipulations 

C. albicans genomic DNA was prepared essentially by the method of Cryer et aL 
15 (1975). DNA was separated on 1 % agarose gels using TAE buffer. Gel purification of 

DNA was from low melting point agarose using agarase. Bacterial plasmids were 

prepared by a modified alkaline lysis/PEG precipitation from Applied Biosystems, Inc. 

Polymerase chain reactions were performed using an Autogene II Programmable 

cycling water bath from Grant Instruments (Cambridge) Ltd. Temperature cycling 
20 consisted of 35 cycles of 95°C for 1 min, 45°C for 1 min and 72°C for 1 min. PCR 

products were purified for sequencing using the QIAquick PCR Purification Kit from 

QIAGEN GmbH, Hilden. 

Sequencing and nucleotide analysis 

25 Sequencing was performed using a combination of subcloning and specifically 
designed oligonucleotide primers. The sequences were determined on an automated 
DNA sequencer (Applied Biosystems 373A DNA sequencer). Oligonucleotides were 
purchased from Macromolecular Resources, Fort Collins or from the DNA Synthesiser, 
Dunedin. Sequences were edited using SeqEd 1.0.3 (Applied Biosystems). Sequence 
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contigs were assembled using VTUTIN 5.21 (Stockwell 1985) and HOMED 5.14 
(Stockwell and Petersen 1987). Other sequence analysis was carried out using 
version 8 of the University of Wisconsin GCG Sequence Analysis Package (Devereux 
et al 1984). The open reading frames were translated using the non-standard C. 
5 albicans genetic code (CUG codes for serine instead of leucine) (Santos and Tuite 
1995 and White et al 1995). Sequences for the alignments in figure 4 and for the 
phylogenetic analysis were obtained from the Genbank database using the following 
accession numbers: 17.6 - A03971, 1731 - S00954 7 CfT-l - Z11866 7 copia - 
A03324, dong - L08889, gypsy - B25666, HIV1 - K02013, Hopscotch - U12626, 

10 jockey - JT0396, MMLV - A03956, Osser - S32437, RSV - S26418, Ta1 - S05465, 
Tf 1 - A36373, Tntl - S04273, Tom S34639, Tst1 - X52387, Tx1 - B32494, Ty1 - 
B28097, Ty2 - S45842, Ty3 - S53577, Ty4 - P47024 and Ty5 - U 19263. The trees 
were constructed using the UPGMA (unweighted pair group method using arithmetic 
averages), Neighbor-Joining and Parsimony methods available in the PHYLIP package 

15 (Felsenstein 1989). Bootstrapping was performed using SEQBOOT and consensus 
trees were derived using CONSENSE, both programs also from PHYLIP. 

The nucleotide sequence of pCal has been submitted to Genbank and assigned the 
accession number AF007776. 

20 

Candida nucleic acid isolations. For DNA isolations, cells were grown at 27°C or 37°C 
to late log/early stationary phase. DNA for the hOG759 library was then prepared 
essentially as in Cryer et al. 1975. DNA for the Southern blots and PCRs was 
prepared as described by Philippsen et al. 1991. To determine the copy number of 
25 TCa2 in hOG759 and hOG1042 it was found to be necessary to purify the 
chromosomal DNA away from the abundant pCal molecules. To do this DNA samples 
from cells grown at 27°C were electrophoresed on 0.7% agarose gels. The high 
molecular weight chromosomal DNA was then cut out of the gel under long 
wavelength UV light. The DNA was then extracted from the gel by spinning through 
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siliconized glass wool in microcentrifuge tubes for 5 min at 6500rpm and 2 min at 
8000rpm. DNA was precipitated by adding an equal volume of 5M ammonium acetate 
and 2 volumes of cold 96% ethanol. The tubes were mixed and then centrifuged at 
13000rpm for 30 min. Pellets were washed in 70% ethanol, dried, resuspended in 
5 10mM Tris-CI 7 pH 7.5; 1mM EDTA and stored at -80°C. 

RNA extractions were performed as follows. Cells were grown in YPD medium 
overnight at either 27°C or 37°C then a volume of culture containing -2.5X1 cells 
was transferred to Falcon tubes. The cells were spun down, washed once in 1ml RNA 

10 buffer [0.5M NaCI; 200mM Tris-CI, pH 7.5; 10mM EDTA - treated with diethyl 
pyrocarbonate (DEPC)], then resuspended in 300/yl RNA buffer and transferred to 
eppendorf tubes. To these tubes was added 200//I RNase-free glass beads (425 to 
600 |nm diameter), 150^1 phenol equilibrated with RNA buffer and 150//I chloroform- 
isoamylalcohol (24:1). The tubes were then vortexed in 30 sec bursts, with intervals 

15 on ice, for a total of 5 min vortexing. 30//I of 10% SDS was then added and the 
tubes were vortexed for a further 2 min. The organic and aqueous phases were then 
separated by centrifuging for 1 min at 13000rpm. The aqueous phase was then 
extracted once more from 150^/1 phenol, 150//I chloroform-isoamylalcohol. RNA was 
precipitated by adding 2 volumes of cold absolute ethanol and holding at -80°C for 20 

20 min. The tubes were then centrifuged for 10 min at 13000rpm; the resulting RNA 
was washed in 70% ethanol, dried, resuspended in 50//I DEPC-treated H2O and 
stored at -80°C. 

RNA preparations were tested for RNase-sensitivity by treating them with 0.2mg.ml~ 1 
25 RNase A for 30 min at 37°C. 

Southern blotting. DNA was electrophoresed in 0.75% agarose with TAE buffer in the 

presence of 0.5//g.mH ethidium bromide. When the DNA fragments were sufficiently 
separated, the gels were photographed under UV light followed by a 5 min wash in 
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sterile H2O. The DNA was then capillary transferred to Hybond-N+ nylon membranes 
(Amersham) using 0.4M NaOH as the transfer solution. Following transfer the 
membranes were rinsed in 2XSSC and stored at 4°C until hybridization. DNA 
fragments to be used as probes were isolated by restriction digestion of plasmid 
5 clones followed by gel purification of the appropriate fragment as described above for 
genomic DNA. The locations of the probes used are shown in Fig. 11 A. Probes were 
radiolabeled with A,32pdCTP by random-primed labelling using Hexanucleotide Mix 
from Boehringer Mannheim. Prior to hybridization, probes were denatured by heating 
in a boiling water bath for 10 min. Hybridization was carried out in sealed plastic bags 
10 in a shaking water bath. Most hybridizations were performed at 65°C, but some lower 
stringency ones were at 55°C. The hybridization buffer was similar to that of Church 
S and Gilbert 1984, but without the BSA (0.36M Na2HP04, 0.1 2M NaH2P04, 1mM 

'JSP? 

^ EDTA, 7% SDS). Membranes were prehybridized in this buffer for 2 hours, the 

P denatured probe was then added in 5ml of fresh buffer and hybridization was allowed 

15 to proceed for 16-20 hours. Post-hybridization washes consisted of two rinses in 

W 2XSSC at room temperature followed by stringency washes in 0.2XSSC (or 0.4XSSC 

M for low stringency), 0.1% SDS at the hybridization temperature. Finally membranes 

If. were rinsed in 2XSSC then exposed to Kodak X-Omat AR film at -80 C using an 

iO intensifying screen. Membranes were stripped for reprobing by rinsing in H2O for 1 

3 20 min, followed by two washes in 0.2M NaOH, 0.1% SDS at 37°C, and then a final 
rinse in 2XSSC. 



Northern blotting. Briefly, approximately equal amounts of total RNA were denatured 
in formamide-formaldehyde at 65°C then separated on 1% agarose, 2.2M 
25 formaldehyde gels in MOPS running buffer (40mM 3-[N-MorphoIino]propanesulfonic 
acid, pH 7.0; 10mM sodium acetate; 1mM EDTA). Following electrophoresis, gels 
were washed twice, 20 min per wash, in RNase-free H2O. RNA was then capillary 
transferred for 5 hours to Hybond-N + membranes using 8mM NaOH as the transfer 
solution. The membranes were then rinsed in 2XSSC, 0.1 % SDS for 5 min. The RNA 
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sides of the membranes were then exposed to UV light for 45-60 sec and the 
membranes were stored at 4°C until hybridization. Probes were radiolabeled double- 
stranded DNAs prepared as described above for Southern blotting. Hybridization was 
performed at 42°C in FPH buffer (5XSSC, 5 0 Denhardt's solution, 50% formamide and 
5 1% SDS). Membranes were prehybridized for 2 hours in this buffer; the denatured 
probe was then added in 5ml of fresh FPH buffer and hybridization was left to 
proceed for about 20 hours. After hybridization the membranes were washed twice, 5 
min per wash, in 2XSSC at room temperature, twice, 5 min per wash, in 0.2XSSC, 
0.1 % SDS at room temperature and twice, 15 min per wash, in 0.2XSSC, 0.1 % SDS 
10 at 42°C. Finally, the membranes were rinsed in 2XSSC and exposed to x-ray film at - 
80°C. 

The films from the Southerns and Northerns were scanned using a Bio-Rad GS-670 
imaging densitometer. Relative band intensities were determined using Molecular 
15 Analyst version 2.1. The brightness/contrast of the scans was adjusted for printing 
using Adobe Photoshop 3.0. 

Recombinant DNA manipulations. A X-library of Sa/7?HI-digested hOG759 DNA was 
constructed using the LambdaGEM-1 1 BamHl Arms Cloning System from Promega, 

20 according to the manufacturer's instructions. The library was screened using the DIG 
DNA Labelling and Detection Kit from Boehringer Mannheim. Probes were derived 
from clones of pCal. Recombinant X DNA was prepared according to the protocol 
accompanying the lambda cloning system from Promega. Bacterial plasmids were 
prepared using an alkaline lysis-polyethylene glycol precipitation method from Applied 

25 BioSystems. Sequencing was performed using a combination of subcloning and 
specifically designed oligonucleotide primers. Sequences were determined on an ABI 
373A DNA Sequencer and edited using SeqEd 1.0.3. Sequences were aligned and 
assembled into contigs using the programs available in the University of Wisconsin 
GCG package and HOMED 5. PCRs were performed on an Autogene II programmable 
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cycling water bath from Grant Instruments, Cambridge. Primers were synthesized on 
an ABI 380B oligonucleotide synthesizer. Primers used for the amplification of the 5' 
regions of TCa2 retrotransposons from various C. albicans strains were as follows: 
Call .2 5'-AGTGAGCTCTGTTGGTTTGTGCACT-3'; Cal2.2 5'- 

5 GCGTCTAGAAATTCTGTACCTTC-3'. Together these primers can amplify the first 
400 bp of the retrotransposon including the complete left LTR. Primers from the 
genomic regions flanking the integrated copy of TCa2 were: TGFS-L, 5'- 
CTAC ATAG G ATG ACTC A C-3 r ; and TGFS-R, 5 1 -ATCC AAGTCTG AA AG ATC-3 ' . 
Temperature cycling consisted of 35 cycles of 95°C for 1 min, 45°C for 1 min, and 
10 72°C for 1 min. PCR products were purified prior to cloning using Strataclean resin 
(Stratagene, La Jolla, CA.). 

Nucleotide sequence accession numbers. The nucleotide sequence of the TCa2 
fragment from hOG759 with the perfect 32-bp minus-strand primer-binding site, and 
1 5 that of the integrated TCa2 element, have been submitted to GenBank and assigned 
accession numbers AF030556 and AF050215, respectively. 

EXAMPLE 1 
Cloning and mapping 

20 Some uncut genomic DNA prepared from Candida albicans strain hOG1042 was 
analysed on an agarose gel and a distinct band running at about 6.5kb was found 
(Figure 1). Such a band had never previously been reported from any Candida strain 
or species. To analyse this feature the band was extracted from an agarose gel and 
tested to see if it could be cut with restriction enzymes. A number of enzymes cut the 

25 band into smaller fragments which indicated that it was made up of double-stranded 
DNA. At this point the band was named pCal (plasmid of Candida albicans). The 
restriction digests allowed the construction of a simple restriction map of pCal. This 
work revealed that pCal was linear, with a Psf1 site about 1kb from one end, an 
fcoR1 site about 1kb from the opposite end and an Aspl^Q site near the middle. To 
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permit further analysis the fragments of pCal produced with Asp 718 were cloned into 
the Aspl^Q site of pUC19. Five clones were isolated and each was found to contain 
just a single Asp7*\8 site, the other apparently destroyed during the cloning, as 
expected. Three of the clones contained a Psf1 site and two contained an £coR1 site. 



EXAMPLE 2 

Nucleotide sequence of pCal 

10 The five plasmids containing the pCal fragments were all sequenced from both ends in 
the hope of finding an identifiable feature which would provide an insight into the 
nature of pCal. The first remarkable features to be found were 280bp direct repeats. 
The existence of these direct repeats suggested that pCal was likely to be a 
retrotransposon. As no other retrotransposon had ever been found existing at a high 

1 5 copy number in a free, linear, dsDNA form we determined the complete sequence of 
pCal. Therefore, the three clones of pCal carrying the Psf1 site and one of the two 
clones carrying the £coR1 site were completely sequenced. In addition a region of 
pCal spanning the central Asp7*\8 site used in the cloning was amplified by PCR and 
each strand was sequenced. This analysis confirmed that there was only one AspT\8 

20 site and that therefore the clones that we had of each half of pCal truly represented 
adjacent fragments. 

Assembly of the 6426bp pCal sequence revealed many characteristics typical of a 
retrotransposon. An obvious feature was the identical 280bp long terminal direct 
25 repeats (LTRs). The borders of these LTRs are short, imperfect, inverted repeats 6bp 
long - 5 f -TGTTGG....CCATCA-3\ This repeat is very similar to that found in the LTRs 
of TCa1 (TGTTCG), Ty3 (TGTTGTAT), 1731 (TGTTG) and copia (TGTTGGAAT). 
Within the LTRs putative TATA boxes and a polyadenylation signal were identified. 
These and other features are highlighted on the sequence of pCal in Figure 2. 
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The minus-strand primer-binding [(-)PBS] was found adjacent to the left LTR and 
consists of the sequence GATTAGAAGTC. This is very similar to the (-)PBS of TCa1, 

GATTAGAAG, but complements 1 1 bases, rather than 9 of a possible tRNA Ar 9 
5 cleavage product. The S. cerevisiae retrotransposons Ty1, Ty2 and Ty3 have been 
found to contain additional sequences 3' to the (-)PBS which complement additional 
regions of the primer tRNA. These additional sequences are likely to be involved in the 
packaging of the primer tRNA within the VLP. An additional region of 
complementarity is also apparent in pCal - the sequence GCGTTG, approximately 30 
10 nucleotides 3 T of the (-)PBS, perfectly complements the sequence CAACGC (bases 

19-24) in the primer tRNA Ar 9 fragment (Figure 3). 

A plus-strand priming site or polypurine tract (PPT) was found immediately upstream 
of the right LTR. It is very similar to the PPT described for TCa1. A second sequence 
very similar to the 3 1 PPT was found near the middle of pCal (bases 3455 - 3465). 
Internal PPTs which function as plus-strand priming sites have been identified in Ty1 
and HIV1 and may serve to speed up the reverse transcription process. The two pCal 
PPTs and that of TCa1 are compared in Figure 3. We believe that the internal PPT of 
pCal may also be serving as a site for plus-strand initiation during the reverse 
transcription process. 

TCa1 and pCal have very similar (-)PBSs and PPTs and very similar borders to their 
LTRs. A comparison of the remainder of the LTRs, however, revealed that the 
similarity did not extend beyond these regions. 
25 

EXAMPLE 3 

The open reading frames 

Two long open reading frames were found in pCal, the first 972bp (324aa) and the 
second 4728bp (1576aa) long. Conserved motifs from the four pol proteins - 
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protease, integrase, reverse transcriptase and RNase H - were identified in the second 
ORF. The order of these motifs (as listed above) places pCal within the TyMcopia 
group of retrotransposons.The pCal motifs are shown compared to those of other 
JyMcopia elements in Figure 4. No conserved motifs were found in the first ORF but 
5 it is similar in size and position to the gag genes of other retroelements. Retroelement 
gag genes in general are known to be extremely variable and it is not uncommon for 
no identifiable conserved features to be present. 

Unlike other retrotransposons, the gag and pol ORFs of pCal are in the same phase 
10 separated only by a UGA termination codon. This arrangement is similar to what has 
been found for mammalian type C retroviruses such as Moloney murine leukemia virus 
(MMLV). In MMLV a UAG termination codon separates the gag and pol ORFs. 
Translation of the pol ORF occurs via the occasional read-through suppression of the 
UAG codon. This suppression requires an 8bp purine-rich sequence immediately 
15 downstream of the stop codon and an adjacent pseudoknot (a pseudoknot being a 
structural element of RNA formed upon the annealing of the nucleotides of a loop 
region with nucleotides outside of that loop) (ten Dam et al 1982). In pCal, an 8bp 
purine-rich sequence, AAAACAGG, lies immediately downstream of the UGA codon 
and this is followed immediately by a potential pseudoknot. These features are 
20 illustrated in Figure 5. A further unusual feature is apparent slightly upstream of the 
UGA codon. It consists of four tandem repeats of the sequence GAAAAA. The role, if 
any, of this distinctive sequence in the ribosomal gag-pol transition is unclear. 

EXAMPLE 4 
25 Copy number of pCal 

The copy numbers of other extrachromosomal elements from lower eukaryotes have 
been determined. For instance, the 2 micron circle plasmids of Saccharomyces 
species exist at 50-100 copies per cell and the Ddp elements of Dictyostelium 
discoideum exist at 50-300 copies per cell. When uncut genomic DNA from the 
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Saccharomyces and Dictyostelium species containing these elements is run out on 
agarose gels the extrachromosomal elements appear as distinct bands running ahead 
of the chromosomal DNA. The intensity of the bands relative to that of the 
chromosomal DNA is indicative of the elements' copy numbers. These elements are 
5 comparable in size to pCal and the host genomes are similar in size to that of C. 
albicans. Therefore, using the relative intensity of extrachromosomal and 
chromosomal DNA in Saccharomyces and Dictyostelium as a guide, we estimated, 
from the relative intensity of pCal and hOG1042 chromosomal DNA, that pCal exists 
at 50-100 copies per cell. 

10 

EXAMPLE 5 
%f Phylogenetic analysis 

= c In an attempt to gain a better understanding of the relationship of pCal to other 

% retroelements a phylogenetic tree of a number of retrotransposons and retroviruses 

ill 15 was constructed. The data used in the analysis were the predicted amino acids of the 
}% seven conserved domains of reverse transcriptase identified by Xiong and Eickbush. 

- The tree was constructed using the UPGMA method within the PHYLIP package and 

p is shown in Figure 6. It is generally consistent with the trees constructed earlier by 

f jf Xiong and Eickbush. For instance, the retroviruses and the firypsy-type 

%Q 20 retrotransposons are closer to each other than to the TyMcopia retrotransposons. 

Within the retroviral group HIV1 and RSV are closer to each other than to MMLV and 
within the Jy3/gypsy group CfT-l and Tf 1 form a group as do the Drosophila elements 
17.6, Tom and gypsy. The tree placed pCal with the TyMcopia elements. This 
placement of pCal is in agreement with the fact that pCal has the pol gene order 
25 protease - integrase - reverse transcriptase - RNase H. Such an order is diagnostic for 
JyMcopia elements. Within the JyMcopia division two broad groups are apparent. 
One group contains the Saccharomyces elements Ty1, Ty2 and Ty4 and the other 
contains copia and 1731 of Drosophila, Ty5 of Saccharomyces, the plant elements 
Hopscotch, Tst1, Ta1 and Tnt1, Osser from the green alga Volvox carter/' and pCal. 
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Within this second group pCal is the most divergent element. Similar results were 
obtained using Neighbor-Joining and Parsimony methods of tree construction. 

EXAMPLE 6 

5 Partial sequencing of additional clones of pCal 

At the start of this work all five of the clones of pCal were partially sequenced. When 
the partial sequences of the three clones carrying the Psf1 site, which represent the 
left half of pCal, were compared it was found that one clone differed from the other 
two at a small number of sites. To determine the full extent of these differences, it 
10 was decided to completely sequence each of these three clones. When the sequences 
were compared it was found that two of the clones were identical, but differed from 
^ the third clone at twelve sites. The differences were all base substitutions. This 

r ,p finding suggested the possibility that the total population of pCal within a cell might 

^ be made up of a number of subpopulations with different sequences. Such a situation 

Ifl 1 5 could arise in a number ways. For instance, there could be a number of integrated 
S retrotransposons, varying in sequence, each contributing to the pCal population. 

s Alternatively, pCal could be a self-sustaining molecule (ie. independent of any 

Q integrated copies) and the inherent inaccuracy of reverse transcriptase could be 

introducing variation into the system. To investigate this idea further we obtained four 
20 additional clones of pCal from a region which differed among the original clones (from 
w the 5' border of the 5 T LTR to the Psf1 site at position 905). The region of greatest 

variability was then sequenced in each of these new clones. Analysis of the 
sequences revealed that the four new clones were identical in sequence to each other 
and to the two original clones which had been found to be identical. This result 
25 suggests that the majority of the pCal molecules in the total pCal population are likely 
to be very similar, if not identical, in sequence. One cannot, however, rule out the 
possibility that more than one integrated retrotransposon is contributing to the pCal 
population or that pCal is a self-sustaining system. 
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EXAMPLE 7 

Expression of pCal extrachromosomal DNA. 

The TCa2 retrotransposon was originally found as an abundant, linear, 
extrachromosomal DNA molecule, referred to as pCal, in C. albicans strain hOG1042. 
5 The level of expression of pCal was so high that it could be seen as a distinct band of 
about 6.5 kb when uncut hOG1042 DNA was analyzed by agarose gel 
electrophoresis. The fact that such a band had not been reported in any other C. 
albicans strains suggested that the level of expression of pCal extrachromosomal DNA 
is much higher in hOG1042 than in any other strain. To examine this idea further we 

10 used Southern analysis to compare the level of expression of pCal amongst a variety 
of C. albicans strains. The strains examined included hOG1042 and its close relative 
hOG759, two recent clinical isolates (SA40 and F16932), and three common 
laboratory strains (SGY269, SC5314, and ATCC10261). In addition, to see if pCal 
expression exhibits any temperature-dependence, pCal levels were compared between 

15 cells grown at 27°C and cells grown at 37°C. The results are shown in Figure 7. The 
upper bands in the figure, running at >20 kb, represent the integrated forms of the 
retrotransposon (TCa2). The differences in hybridization intensity of these bands 
reflect the differences in the copy number of the integrated form (see below). Also, 
the extrachromosomal, pCal molecules are seen as a band at about 6.5 kb with a 

20 smear trailing off below. On other blots distinct bands can be seen in the smears, 
suggesting that the smears represent incomplete or subgenomic reverse transcripts 
rather than them being the result of degradation during the DNA isolation procedure. 
A broad range of fragment sizes, as well as molecules of discrete lengths, have 
similarly been reported for reverse transcripts isolated from Ty1 particles (Garfinkel et 

25 al 1985). With these points in mind it can be seen that pCal expression varies greatly 
amongst the various strains and that it is strongly dependent on temperature. As 
expected the highest levels of pCal were found in hOG1042 and the closely related 
strain hOG759. An abundance of pCal molecules was also found in two other strains, 
SA40 and F16932. Densitometric analysis indicated that the level of expression in 



tjk0879 



PATENT 
674521-2001.1 

these two strains is approximately a fifth that in hOG1042 and hOG759. A low level 
of pCal expression was found in two strains, SGY269 and SC5314 (about 50- to 
100-fold lower than in hOG1042 and hOG759). The majority of pCal molecules in 
SC5314 appear to be less than full-length. This seems to be a characteristic of this 
5 strain, rather than being the result of degradation of this particular sample, as it was 
seen consistently with different DNA preparations. The last strain, ATCC10261, 
produced no detectable extrachromosomal pCal molecules at all. In each strain that 
produces pCal, a much higher level of pCal expression was found at 37° than at 27°. 
Densitometric analysis indicated a 10- to 20-fold difference in expression between the 
1 0 two temperatures. 

EXAMPLE 8 

TCa2 RNA expression. 

The results showed that the number of pCal molecules per cell varies greatly amongst 
15 different strains. This strain-dependent expression could arise in a number of different 
ways. It could result from strain-specific differences in the efficiency of reverse 
transcription of the retrotransposon RNA molecules. Alternatively, each of the strains 
could have a similar potential for reverse-transcription, but there could be widely 
varying amounts of RNA for the reverse transcriptases to act upon. A combination of 
20 these two possibilities could also be responsible. In an attempt to distinguish between 
these three scenarios, RNA was extracted from each of the seven C. albicans strains 
using cells grown at either 27°C or 37°C. The RNA was then subjected to Northern 
analysis using the same probe as in the Southern shown in Figure 7. The results are 
presented in Figure 8. It can be seen, by comparing Figure 7 and Figure 8, that the 
25 pattern of TCa2 RNA expression is very similar to the pattern of pCal DNA 
expression. In each strain there is a greater amount of TCa2 RNA in cells grown at 
37°C than in cells grown at 27°C. Densitometric analysis indicates a 5- to 10-fold 
difference between the two temperatures. Also the strains which produce the largest 
amounts of pCal DNA, in general, also have the largest amounts of TCa2 RNA. This 
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finding that the observed patterns of pCal DNA and TCa2 RNA expression are very 
similar, and the fact that pCal is a small, linear, extrachromosomal DNA molecule, 
however, suggests the possibility that the signals seen on the Northern blot in Figure 
8 may not represent the RNA at all; instead, they might be the result of hybridization 
to some pCal DNA contaminating the RNA preparations. To test this possibility, RNA 
samples were treated with DNase-free RNase A for 30 minutes and then compared to 
untreated RNA samples by Northern blotting using the TCa2 probe (not shown). We 
found that after the RNase A treatment less than 10% of the hybridization signal 
remained, indicating that the great majority of the signals seen in Figure 8 does truly 
represent hybridization to RNA. In addition, pCal DNA samples were denatured under 
the same conditions as the RNA, and then also examined by an identical Northern 
blotting procedure (not shown). We found that, under the Northern blotting 
conditions, pCal DNA gave only a very weak signal. This suggests that even the 
hybridization signal that remains after RNase A treatment of the RNA samples is 
unlikely to be due to contaminating DNA, but rather, is likely to represent 
incompletely digested RNA. 

The similarity in the patterns of TCa2 RNA and pCal DNA expression suggests that 
the strain-dependent variations in the levels of pCal DNA are largely the result of 
similar inter-strain variations in the levels of TCa2 RNA. Or put another way, the inter- 
strain variations in the levels of pCal DNA are introduced mainly at the level of 
transcription rather than reverse transcription. The inter-strain variations in pCal 
expression, however, are unlikely to be produced exclusively at the transcriptional 
stage. It can be seen from Figures 7 and 8 that the patterns of TCa2 RNA and pCal 
DNA expression, though very similar, are not exactly the same. For instance, SGY269 
and SC5314 produce significantly more pCal than ATCC 10261 yet both of these 
strains have lower levels of TCa2 RNA than ATCC10261. In addition, F16932 and 
SA40 have similar amounts of pCal, but F16932 has approximately 5-fold more TCa2 
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RNA. These differences probably are the result of variations introduced at the level of 
reverse transcription. 



5 EXAMPLE 9 

Comparison of TCa2 LTRs from various strains. 

It is possible that the differences in the levels of TCa2 RNA seen in the different 
strains result from differences in the promoters of the retrotransposons in those 
strains. As an initial means of testing this possibility we cloned and sequenced the 

10 first 400 bp, including the entire left LTR, of TCa2 retrotransposons from each of the 
various strains. By analogy with other retrotransposons, this region should contain all 
the major sequences regulating transcription. The sequences are shown compared to 
each other in Figure 9. It can be seen that the sequences are all remarkably similar to 
one another, there being no insertions or deletions and very few base substitutions. 

1 5 The few differences that there are do not seem to fall into a pattern that can be easily 
explained by relatedness of the various elements. The variations appear to be located 
in a non-random manner, some sites seeming more prone to variation than others. 
These variable sites may represent hotspots for mutation during reverse transcription. 
Within the LTRs, the sequences are identical at 275 out of 280 sites and there is no 

20 obvious correlation between the differences and the abundance of TCa2 RNA in the 
host strains. It therefore seems unlikely that differences in the promoters of the TCa2 
retrotransposons in the various strains could account for the observed differences in 
RNA expression. 

25 An interesting finding that did emerge from this work, though, is that there is 
variation in the sequence of the minus-strand primer-binding site (PBS). The PBS is a 
short sequence adjacent to the left LTR which is complementary to part of a 
cytoplasmic tRNA. The tRNA binds to the retrotransposon RNA at this site and its 
3'OH can then be used by RT to prime minus-strand DNA synthesis. In most 
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retrotransposons and retroviruses, the PBS complements the 3' end of the primer 
tRNA. TCa2, and a few other Ty1 Icopia retrotransposons, for example Ty5 and copia 
are exceptions to this general rule in that their PBSs complement an internal region of 
the primer tRNA and the primer is not a complete tRNA, but rather, a 39- or 40- 
5 nucleotide fragment of one. In the original description of the pCal sequence, the PBS 
was predicted to be 1 1 bases long, by comparison to tRNA Ar 9(UCU) 0 f 5 cerevisiae. 
Since then the sequence of tRNA Ar 9< ucu ) of C. albicans has become available. A 
comparison of the pCal sequence to this tRNA showed that the homology between 
the pCal RNA and the tRNA primer extends over 32 bp, although there would be a 
10 number of unpaired bases in the PBS-tRNA primer duplex. Comparison of the 
sequences obtained here, however, shows that the variations found in the PBS region 
actually give some clones a better match to the tRNA primer fragment than that 
^ found in the original pCal sequence. One LTR in particular, isolated from hOG759, has 

0 5 base substitutions relative to the original pCal sequence and these result in a 

^ 1 5 perfect 32-bp match to the primer tRNA. In addition, the region between the PBS and 
O the start of the gag ORF was found to have the potential to form into a stem-loop. 

The possible secondary structure of the tRNA primer fragment and the 5 1 region of 
the TCa2 RNA, as they might appear in the minus-strand priming complex, is depicted 
^0 in Figure 10. 

AT- 

EXAMPLE 10 

TCa2 is a moderately repetitive element and may still be active. 

An important unanswered question, which may have implications for the regulation of 
this system, is: how abundant are the integrated chromosomal copies in the various 
25 strains? To answer this question we subjected genomic DNA samples from each 
strain to Southern analysis using either the internal TCa2 fragment, or the LTR, as a 
probe (Figure 11). The DNA samples used were isolated from cells grown at 27°C to 
minimize interference from the extrachromosomal copies. Also, to see if TCa2 is 
specific to C. albicans or whether it is also found in other Candida species, we 
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analyzed the closely related species C. maltosa, C. parapsilosis, and C. tropica/is and 
the more distantly related C. pseudotropica/is. The locations of the internal and LTR 
probes and some important restriction sites are shown in Figure 1 1 A. In Figure 1 1B it 
can be seen that in SGY269, SC5314, ATCC10261 and SA40 the element TCa2 is 
5 present at a low copy number - just one or two copies per cell. In F1 6932 five bands 
were found that hybridized to TCa2, indicating a moderate copy number in this strain. 
No hybridization to DNA from any of the other Candida species analyzed was 
detected suggesting that TCa2 is specific to C. albicans. This was true even when the 
blot was reprobed at lower stringency and exposed for a long period of time (not 

10 shown). In Figure 11C it can be seen that the TCa2 LTR is more abundant in 
SGY269, SC5314, ATCC10261, and SA40 (5 to 7 copies per cell) than the full- 
length retrotransposon. The number of LTRs in F16932 is hard to tell from this 
exposure because the bands are close together. Analysis of a variety of different 
exposures, however, revealed about 12 bands hybridizing to the LTR in this strain 

15 (not shown). 

Determining the copy number of TCa2 in hOG1042 proved to be more problematic. 
Even though the DNA used was isolated from cells grown at 27°C (in which the 
expression of pCal is 10- to 20-fold lower than in cells grown at 37°C - Figure 7), it 

20 was found that the signal from the extrachromosomal copies overwhelmed any signal 
from the integrated copies to such an extent that no bands could be distinguished 
(lanes 1 f Figure 1 1 B and 11C). To get around this problem we purified the intact 
chromosomal DNA away from the extrachromosomal copies of pCal by separating the 
two on agarose gels, then extracting the chromosomal DNA from the gels. This was 

25 done for both hOG1042 and the closely related strain hOG759. The copy number of 
TCa2 in each strain was then determined by Southern analysis (Figure 12). Three 
different enzymes, Pst\ f fcoRI, and C/al, were used to cleave the DNA prior to 
electrophoresis. The number of bands detected varied depending on which enzyme 
had been used. Four or five bands were detected in Pstl-cut DNA. Four bands were 
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found when EcoR\ had been used and eight or nine were detected in C/al-cut DNA. 
Each of these enzymes cuts TCa2 on just one side of the probe so the bands detected 
should represent DNA molecules containing a fragment of TCa2 and the flanking DNA 
out to the nearest cleavage site for each enzyme. These fragments will generally be of 
5 different sizes and so will appear as separate bands. However, in the situation where 
the distance out to the nearest flanking restriction site is similar for retrotransposons 
at two different genomic loci, then the resulting fragments will comigrate in the gel 
and give a single band of increased intensity. The finding that the TCa2 probe 
hybridizes to different numbers of bands depending on the enzyme used, as shown in 

10 Figure 12, and that the bands vary in intensity (for example, Figure 12, lane 1) 
suggests that the brighter bands represent more than one integrated TCa2 
retrotransposon. In such a situation the digest giving the greatest number of bands, 
and bands the most similar in intensity, is the most reliable indicator of copy number. 
Here this is the C/al digests. Even in the C/al digests, however, some bands appear at 

1 5 greater intensity than others suggesting that they may also represent more than one 
integrated copy of TCa2. Taking this into account, and given that the C/al digests 
give 8 or 9 bands, we estimate that there are 10 to 12 integrated copies of TCa2 in 
hOG759 and hOG1042. Overall, the hyridization patterns found for hOG759 and 
hOG1042 are very similar. Interestingly, however, they are not identical. In the Pst\ 

20 digests (Figure 12, lanes 1 and 2) hOG1042 has a band at about 7.5 kb that is not 
found in hOG759. In the EcoR\ digests (lanes 3 and 4) the two strains give the same 
bands, but the band at about 7 kb is brighter in hOG1042. Again, in the C/al digests 
(lanes 5 and 6) hOG1042 has a band at about 11 kb that is not found in hOG759. 
Together, these findings suggest that there is at least one more copy of TCa2 in 

25 hOG1042 than in its close relative hOG759. Given the abundance of full-length copies 
of the retrotransposon in these strains, the most likely explanation for this finding is 
that a copy of TCa2 has integrated into the hOG1042 genome in the short time since 
the divergence of this strain from hOG759. 
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It is interesting to note that the number of integrated copies of TCa2 in each strain 
correlates fairly well with the levels of TCa2 RNA produced by each strain. For 
instance, the highest amounts of TCa2 RNA are found in hOG759 and hOG1042, 
which also have the greatest number of integrated copies. F16932, with about 5 
5 integrated copies of TCa2, has the next highest amount of RNA, and SGY269, 
SC5314, ATCC10261, and SA40, with 1 or 2 TCa2 elements apiece, have only low 
levels of TCa2 RNA. It is not a simple, or linear, correlation however: hOG759 and 
hOG1042 produce at least 50 times as much TCa2 RNA as SGY269, SC5314, etc. 
but they have just 10 times as many integrated copies. This indicates that additional 
10 factors, as well as TCa2 genomic copy number, are involved in generating the variable 
levels of TCa2 transcripts. 



EXAMPLE 1 1 
1 5 An integrated copy of TCa2. 

The sequence of pCal was primarily based on two clones that were derived from the 
pool of extrachromosomal copies in hOG1042. To determine if this sequence is 
typical of the TCa2 retrotransposon family, or if it differs in some important way from 
the integrated copies, we constructed a ^-library of hOG759 DNA and from it we 

20 cloned and sequenced a full-length, integrated copy of TCa2. The sequence of this 
copy of TCa2 (GenBank accession no. AF050215) is very similar to that of pCal. Over 
their entire length of 6426 bp the two elements differ at only three sites, each of 
these differences being the substitution of one base for another. Two of these base 
substitutions occur in the region encoding the RT and the other is in the RNase H 

25 coding region. The base changes do result in changes to the predicted amino acid 
sequence of the RT and RNase H proteins. It is possible that these amino acid 
alterations result in significant differences in the catalytic properties of the RTs and 
RNase Hs. Whether or not such changes play a role in the over-production of pCal in 
some strains is uncertain. It may be instructive to compare the sequences that we 
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have determined of TCa2 and pCal with the sequence of a copy of TCa2 from a strain 
which produces only low amounts of pCal, such as SGY269 or SC5314. In any case, 
the finding that an integrated copy of TCa2 has an almost identical sequence to pCal 
indicates that there are no major sequence differences distinguishing the 
5 extrachromosomal forms of this retrotransposon from the integrated copies. 

The DNA sequence of the regions flanking the integrated copy of TCa2 was also 
determined (not shown). Starting about 800 bp upstream of the retrotransposon is 
sequence virtually identical to that of the 5' regions of the C. albicans CDR1 gene 

10 (Prasad et al 1995), which has been assigned to chromosome 3 
(http://alces.med.umn.edU/candida/maps/3.html). About 100 bp downstream is the 
start of an ORF that bears a strong resemblance to the 5' regions of cytoplasmic 
dynein heavy chain genes found in some other fungi. A C. albicans sequence 
containing an ORF that bears a strong resemblance to the central region of other 

15 fungal cytoplasmic dynein heavy chain genes has previously been assigned to 
chromosome 3 ( http://alces.med.umn.edu/bin/genelist LDYN1). These findings 
indicate that the cloned copy of TCa2 is located on chromosome 3, between CDR1 
and a gene encoding cytoplasmic dynein heavy chain. Using PCR and primers 
corresponding to sequences on either side of the TCa2 integration site we were able 

20 to amplify and sequence, from hOG759, another allele without an integrated 
retrotransposon. This work revealed, therefore, that this locus is heterozygous for the 
presence of TCa2, and it also showed that the insertion of TCa2 resulted in a 
duplication of 5 bp (ACACG) at the integration site, as is commonly found with other 
retrotransposons. 

25 

DISCUSSION OF RESULTS OF EXAMPLES 7-1 1 

Expression of pCal DNA is strongly dependent on temperature and varies greatly 
among C, albicans strains. The expression of TCa2 RNA occurs in a similar pattern to 
that of the pCal DNA, suggesting that the variations in pCal expression are introduced 

60 



tjk0879 



PATENT 

674521-2001.1 

predominantly at the level of transcription. A comparison of the 5' sequences of TCa2 
retrotransposons from various strains, however, failed to identify any intrinsic 
differences which could account for the observed strain variations in expression. 
Some elements, though, were found to have very long tRNA primer-binding sites, 
5 which may predispose them to efficient reverse transcription. The integrated TCa2 
form was found to be a moderately repetitive element, present at 1 to about 10 
copies per genome. TCa2 copy number correlates well with TCa2 RNA expression, 
but is insufficient to account for all the strain variation, suggesting the involvement of 
other factors. Sequence analysis of an integrated copy of TCa2 showed that it is very 
10 similar to pCal and is inserted between two closely placed genes. Variation in TCa2 
copy number between two closely related strains suggests that the element is still 
W transpositionally active. 

sips 

5 15 EXAMPLE 12 

C3 Further retrotransposons have been found. These are shown in Figures 17-48. 

y Isolation of the C. albicans retrotransposon sequences began with a search for 

f! f 

ui 20 sequences similar to C. albicans retrotransposon sequences present in the Embl 
% Nucleotide Sequence Database (Stroger et al 1988) release 56, using the BLASTN 

program (Altschul et al 1990, 1997) version 2.0.4. A total of 28 similar sequences 
were identified in the proprietary Pathoseq™ database (Incyte Pharmaceuticals Inc 
Palto Alto CA, USA). These are different from the complete retrotransposon 
25 sequences presently available, or extend the partial retrotransposon sequences 
presently available. 

The majority of the retrotransposons are not complete. However these partial 
retrotransposons can, for example, be usefully used as probes to identify the full 
30 sequences. 
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The partial sequences can be used as probes for the complete sequence if one was 
screening a DNA library. The full length retrotransposon sequences are themselves 
potentially useful as variants of the described TCa2. As an example the LTR 
5 promoter of TCa2 shows a different activity pattern (eg, temperature inducibility) to 
another unrelated retrotransposon TCa1. The retrotransposon TCa1 is less 
transcriptionally active at 37° than 27° while TCa2 is more active at 37° than 27°. 

Retrotransposon 15 (Figure 32) is complete and can be used in an expression and 
10 disruption system. For example, it can be used to provide an expression vector which 
includes retrotransposon 15 7 and could be used in a gene disruption system in 



It may also be used as a transformation and expression system for Candida 
1 5 comprising the retrotransposon. 



The Production of auxotrophic mutants from a strain iB65 (the original strain from 
which all the pCal carrying strains were derived) and its derivatives. This example 
shows the appearance of an auxotrophic mutant allele in the strains derived from 



Candida. 



20 



EXAMPLE 13 



25 



iB65. 



i) 



The strain was isolated from an undergraduate mouthwash (iB = intermediate 
Biology) in 1984. iB65 was heterozygous for a methionine auxotrophy and 
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gave rise (following UV irradiation) to a number of homozygous methionine 
auxotrophs in 1984 including hOGMet5; 

ii) hOGMetS : (Met). This strain was exposed to N-methyl-N-nitro- 
nitrosoguanidine mutagenesis and gave rise to numerous red adenine 
auxotrophs (some termed hOG 758-hOG762). Some of these were adel and 
some ade2. An unusual feature was that some (for example 
hOG759:Ade1Met) were completely non-revertible. 

iii) Strain hOG762 (Ade2Met) was exposed to a further round of UV irradiation in 
1988 and gave rise to numerous auxotrophs of a unique type. These 
auxotrophs required either aspartic acid or proline or alpha keto-glutarate. 
They are some kind of glyoxylate/TCA cycle mutant. We have never 
encountered TCA cycle mutants before or since. These auxotrophic mutants, 
like the adel mutant described above, were absolutely non-revertible even 
after mutagenesis. This is most unusual for Candida mutants. Strain hOG762 
must have become heterozygous for the Asp/Pro mutant allele. It therefore 
acquired the characteristic of producing 'Asp/Pro' auxotrophic homozygous 
derivatives. 

We believe that the non-revertible adel and asp/pro mutant alleles produced in these 
strains were generated by insertions of TCa2. Such an insertion would give a non- 
revertible mutant allele. 

The pCal carrying strains gave rise to non-revertible mutants (as would be expected 
given the abundance of linear retrotransposon DNA). 

We have tested this hypothesis by comparing the Southerns of hOG1042 (a strain 
carrying the asp/pro mutant allele heterozygously) and hOG759 (Figure 15). There is 
an additional band present in hOG1042 (EcoFM Lane 4, Cla1 Lane 6) which is what 



63 



tjk0879 



PATENT 
674521-2001.1 

would be expected if hOG1042 carries an additional copy of TCA2 integrated into the 
mutant asp/pro allele. 

We have also tried to find evidence for TCA2 retrotransposition in strains of this 
5 family in the absence of any mutagenesis or phenotypic change. This is shown in 
Figure 16, most obviously in Lanes 6 and 11. In general the strains that show extra 
bands following the EcoR1 digest also show bands following a Cla1 digest. This 
helps confirm the strains are carrying extra copies of the TCa2 retrotransposon. 

10 These Southerns demonstrate that TCa2/pCal is retrotranspositionally active. If the 
element is transposing at this frequency in the absence of selection then in the 
presence of selection it should be relatively easy to isolate strains carrying disrupted 
alleles. 

1 5 There are several ways of applying selection but the simplest would be to include a 
selectable gene within the retrotransposon. 

The asp/pro allele is an example of gene disruption by the retrotransposon. 

20 Examples 14-18 show the characterisation of the integrated form of TCa2 and a 
comparative analysis of its expression. 

EXAMPLE 14 

25 The use of TCa2 as an expression system and as a transformation system: 
construction of a Vector System with the Candida albicans Retrotransposon pCAL 

The aim was to create a vector system based around the C. albicans retrotransposon- 
like element. The plasmid pRPU3 was constructed in which a URA3 sequence was 
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placed within the retrotransposon at the very end of the ORF2 coding sequence 
adjacent to the 3' untranslated region. The URA3 is on its own promoter and it 
functions to confer prototrophy on ura3 auxotrophs following transformation. This 
demonstrated that a selectable gene, such as URA3, can be placed in this position 
5 and still function. 

Materials and Methods 

Microbial Strains and Plasmids 

10 For in vitro plasmid construction and for plasmid amplification the E. coli strain DH53 
was used (Woodcock et al., 1989). 

The strain from which pCAL was isolated was hOG1042, a C. albicans auxotrophic 
isolate, derived from an oral isolate by mutagenising the parental strain and selecting 
1 5 for red adenine auxotrophs. 

Four other yeast strains were used in the transformation of the retrotransposon based 
plasmids. These were SGY269, GSY1 12, MIB1 and CHAU1 . They were selected as 
recipients for the transforming DNA because of their uridine auxotrophies and defined 

20 genotypes. SGY269, a C. albicans strain derived from the parental strain A81-Pu by 
directed mutagenesis (Kelly et al., 1987) has the genotype ade2lade2, 
ura3;:ADE2/ura3;:ADE2. GSY112 is a haploid ura3 and Ieu2 auxotrophic S. 
cerevisiae strain, Matd ura3 pep4 ::H/S3 prbJ- DI.6R Ieu2 ::hisG canJ cir° 
(Wagenbach et al., 1991). MIB1 is a S. cerevisiae strain constructed for this work 

25 and is auxotrophic for both adenine and uridine. It was created by crossing a 1.0 
(Woods and Bevan, 1965) with GSY112. The diploids were sporulated and an 
ade1/ura3 was purified. CHAU1, a C. maltosa strain (Ohkuma et al., 1993) has the 
genotype his5/his5 ade1/ade1 ura3/ura3. 
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Plasmid DNA used in the construction of the retrotransposon based plasmids were the 
kind donations of various labs. Plasmid pET3 was provided by E. Y. H. Tsay (Gillium 
et al., 1984), pSM7 was provided by M.B. Kurtz (Kurtz et aL, 1987) and pRC2312 
was provided by R. Cannon (Jenkinson et aL, 1988). The E. coli plasmids pUC19 
5 (Yannish-Perron et aL, 1985) and pBluescript (Short et aL, 1988) were used in the 
cloning exercises. pK19 and pUCK1 are plasmids in which the kanamycin cassette 
from M13mp18-19 (Markie et aL, 1986) was inserted into the Seal site of pUC19. 
pUCK1 however lacks some of the restriction sites in the cloning cassette. pNRE1 is 
a plasmid containing the kanamycin cassette from M13mp18-19 as an fcoRI 
10 fragment in pUC19 and made ampicillin sensitive by removing the Pvul portion of 
pUC19. 



Oligonucleotides 



15 Two primers, CaIR and CalF were designed to create a unique Afe/1 restriction 
endonuclease recognition site (ATGCAT) at the very end of the POL of pCAL. The 
overlapping primers match the pCAL sequence at all but one position to provide a site 
for the insertion of a selectable marker. The mismatch is a T instead of an A in the 
fourteenth position of CaIR and the complementary A as the tenth residue of CalL. 

20 The sequence of CaIR is 5'GATACAAAATGCATTAACGGCAG3' and the sequence of 
CalL is 5 ' CTG C CGTTA ATG C ATTTTGT ATC3 ' . These primers were used in 
conjunction with the universal forward and reverse primers complementary to pUC19. 

Another pair of primers was designed to amplify the C. albicans URA3 gene from the 
25 plasmid pET3 with Pst\ restriction sites on the ends. The underlined portion of 
5URATT 

( 5 ' C G ACG G CTG C A GTTCTTC A ATG ATG ATTTC A AC3 ' ) is complementary to the 
upstream region of the gene described by Losberger and Ernst (1989), and the 
underlined portion of 3URA 
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(5'CGACGGCTGCAG CCTTCACATTTATAATTGGC3 ') is complementary to the 3' end 
of the gene but not including any non-coding regions. 

Primers were also designed to amplify the URA3 gene and the retrotransposon LTR 
5 after the two had been cloned adjacent to each other in the plasmid pRPU3 (described 

later). A primer corresponding to the 5' end of the URAZ gene, URAXMAS1, and a 

primer complementary to the 3' end of the right LTR, 3LTR, were synthesised. 

URAXMAS1 (5'GCGAGATCTA GATATGACAGTCAACACTAAG3 ') contains a 

synthetic Xbal restriction site and allows a fusion construct to be made in frame with 
10 PCR products derived from CAL2 and CAL5 (described below). No promoter 

sequences are amplified with this primer. 3 LTR 

O (5 f CGACGCCTGCAG GTGATGGAATATAAACTTTC 3') contains a synthetic Pst\ 

I? restriction site. The underlined region is that which is complementary to the 3' end of 

the retroelements right LTR. 

ifi 15 

an. 

Three primers were designed to amplify portions of the retroelement for further 
s analysis. CAL1 (5'AGTGAGCTC TGTTGGTTTGTGCACT3 ') contains a synthetic Sac\ 

D restriction site and the underlined region complements the 5' end of the left LTR. 

FU CAL2 (5'GCGTCTA GAAATTCTGTACCTTC3 M is complementary to a region of the 

5 20 5 ' LTR just upstream of the gag ORF. CAL2 in conjunction with CAL1 allows for the 
^ amplification of the left LTR. CAL5 (5'GCGTCTAGAA CATTCCAGTGAAGT3 ') 

complements the region spanning the UGA stop that separates the gag and pol 
ORFs. A single base mismatch changes the TGA stop to a TGT codon. Both CAL2 
and CAL5 contain Xba\ restriction sites to allow the fusion of the URA3 gene (in 
25 frame in the case of CAL5). CAL5 in conjunction with 101F 
(TCTAAGCTACCAAAGCAC) enables the amplification of a portion of the gag ORF 
and removal of the stop codon so that the gag and pol ORFs are contiguous. 

DNA manipulations 
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Plasmid DNA isolation and plasmid subcloning; recombinant plasmid construction; and 
restriction mapping were all performed according to Maniatis et al., (1982). 
Transformation of E coli DH59 was performed according to the method of Maniatis et 
al., (1982) with some modifications. Instead of recovering in SOC media, 500 |Ltl of 
5 TB was used. Cells were plated onto BB plates (10 g/L Tryptone, 8 g/L NaCI) with 
antibiotic selection. DNA fragments were purified after electrophoresis in low melting 
point agarose (FMC Bioproducts, USA) using agarase (GELase™, Epicentre 
Technologies, USA) according to the manufacturers instructions. 

1 0 Construction of pRPU3 {a marked element) 

The pCAL retrotransposon was originally discovered as a linear extrachromosomal 
element in C. albicans strain hOG1042. It was cloned into pUC19 as two halves 
using a central AspT\B site. The resulting clones each had one Asp718 site, the 
other destroyed during the cloning procedure, as expected. 

15 

Two of these clones, p30 and p36, represent the 5' half of pCAL, whilst another two 
clones, p5 and p45, represent the 3' end of the element. An fcoRI site in the cloning 
cassette of p30 was subsequently destroyed by digesting the plasmid with fcoRI, 
filling in the ends with Klenow and religating. This plasmid, p30E*, was then 

20 digested with >4sp718 and BamH\ and the retrotransposon fragment from a similarly 
digested p45 was ligated in. The new plasmid, pUCCAL, was sequenced. pUCCAL 
has the same structure as the native retrotransposon. However further sequencing of 
p36, p5 and additional clones of pCal revealed that the two fragments used to create 
pUCCAL differed from all the others, presumably because of point mutations incurred 

25 in the reverse transcription. 

The following describes the construction of a plasmid with DNA sequence that 
conforms to the most common form of pCAL; construction of a Nsil restriction site 
within this sequence; and the addition of a selectable marker and a C. albicans origin 

68 



tjk0879 



PATENT 
674521-2001.1 

of replication. The cloning strategy is shown in Figure 49. Separate PCR products 
were generated using the primer CalL and the universal primer of pUC19, and CaIR 
and the reverse primer of pUC19. The template was p45E*, a plasmid containing 
979bp of the 3' end of the retrotransposon from p45. PCR products were joined 
5 using the new Afe/I site, cloned into pUC19 and the plasmid was named pNsi. The 
EcoR\IHind\\\ fragment from pUCCAL was replaced with the EcoR\/Hind\\\ fragment 
from pNsi. The presence of the Nsi\ site in the resulting plasmid, pCALNsi, was 
confirmed by restriction digest. 

10 The plasmid containing the C. albicans URAZ gene, pET3, was used as the template 
for another PCR reaction. The primers 5URATT and 3URA were used to produce a 
URA3 gene with synthetic Pst\ restriction sites at each end. This was cloned into 
pUC19 and named pURA25TT. The URA3 gene was cut out of the pURA25TT using 
Pst\ and ligated into the Nsi\ site of pCALNsi creating pCNURATT. The orientation of 

15 the URA3 gene was confirmed by restriction analysis. pCNRUATT represents the 
complete pCal retrotransposon cloned into pUC19. It has a C. albicans URA3 gene 
cloned into a synthetic Nsi\ restriction site at the 3' end of the pol ORF. The URA3 
gene is expressed off its own promoter. 

20 The intention of this construction was that as the retrotransposon was tagged with a 
selectable marker it could be analysed in auxotrophic hosts. As analyses of the other 
clones representing the integrated form of pCAL progressed, some additional steps 
were required to replace portions of the plasmid represented by p30 or p45 that were 
not the most common sequence of pCal. There was four differences over some 

25 3.5kb between p45 and p5 and twelve differences over a similar area between the 
clones p30 and p36. One of the differences between p5 and p45 was an in frame 
stop in p45. The following changes were made to render the retrotransposon portion 
of the plasmid identical to the most common sequence of pCAL. A S*yl/>4sp718 
fragment from pCNURATT was replaced with the same fragment from p5 creating 
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pRPU1 . All of the retrotransposon sequence from p30 and all of the pUC19 sequence 
of pRPU1 was replaced with p36 resulting in pRPU2. This was achieved by 
linearising p36 with AspT\% and BamH\ and ligating the Aspy\8/BamHl fragment 
from pRPU1 into this. 

5 

The last step in the construction of a plasmid that would be capable of replicating in 
both E. cofi and C. albicans was to add the Candida Autonomously Replicating 
Sequence (CARS), This was done by first subcloning the CARS element as a Sph\ 
fragment from pRC2312 into pUC19. The CARS element was then transferred to 
10 pRPU2 as a Hind\\\IBamH\ fragment, creating pRPU3. 



Construction of Reporter Gene Plasmids 

1 5 Two plasmids were constructed for use as reporter genes. Both of these contain a 
CARS element and the C. albicans URA3 gene (see Figure 50). The URA3 gene and 
the right LTR were amplified by PCR using pRPU3 as the template. The primer 
URAXMAS1 was paired with 3LTR. The resulting PCR product was cloned into 
Xba\IPst\ digested pK19 and named pUX1L. The Xba\IPst\ fragment was then cloned 

20 into pCARS creating the plasmid pUXILC. A fragment of the retroelement was 
amplified by PCR from p36. The primer CAL2 was used with the primer CAL1 to 
generate a 0.4 kb product. A kanamycin resistant clone of p36 (p36K) was used as 
the initial recipient for this PCR product. The product was cloned using the synthetic 
Sac\ and Xba\ restriction sites designed as part of the plasmid. This plasmid was 

25 labelled p36Kf1. The Xbal/Sacl fragment from this plasmid was then cloned into 
pUX1L and labelled pTIM2. Expression of the URA3 gene in pTIM2 is driven off the 
LTR promoter. 
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The plasmid p36f4UX1LC contains a CARS and the C. albicans URA3 gene which 
both function in S. cerevisiae. A PCR product was made using the 101F and CAL5 
PCR primers and pRPU3 as a template. It was cloned into p36K using the synthetic 
Xbal restriction site of CAL5 and an internal BgfW site. From this plasmid, p36Kf4, the 
5 Sac\IXba\ fragment was cloned into SacUXbal digested pUXILC creating 
p36f4UX1 LC, The URA3 gene in p36f4UX1 LC is present as an in frame fusion to the 
pCAL pol ORF. 

Construction of Plasmids for in vivo Recombination in C. maltosa 

10 The C. maltosa ADE1 gene has been cloned in the plasmid pRA2 (Sasnauskas et al., 
1991). The gene was cloned from pRA2 into pUCK1 as a BamhH fragment and 
labelled pNRE2 (see Figure 51). From this plasmid it was cloned as a Ssp\ fragment 
into HindW digested pUC19 and named pNRE3. The HindW fragment containing the 
kanamycin cassette from pNRE1 was ligated into Sma\ digested pNRE3. The 

15 resulting plasmid, pNRE4, was restricted with EcI136/Dra\ and the fragment 
containing the kanamycin cassette and the ADE1 gene was cloned into the Ssp\ site 
of pUC19 to create the ampicillin sensitive, kanamycin resistant plasmid pNRE5. 
Thus pNRE5 is a pUC19 based plasmid containing the adjacent kanamycin resistance 
cassette and C. maltosa ADE1 gene inserted into the ampicillin resistance gene. 

20 

Construction of Plasmids for in vivo Recombination in S. cerevisiae 

The C. albicans ADE2 gene from pSM7 was excised using EcoRV and blunt-end 
ligated into Sma\ digested pBluescript destroying these sites. The resulting plasmid, 
pBSAde2, was linearised with fcoRV and the kanamycin element from pNRE1 was 
25 blunt-end ligated in as a HindW fragment. The kanamycin element and the ADE2 gene 
are adjacent in this new plasmid, pBSKanAde2. A fragment containing the first 900bp 
of pCal was cloned into Sma\/Pst\ digested pUC19 and labelled pSP2. The kanamycin 
element and the ADE2 gene was excised from pBSKanAde2 as an Asp718/Sac\ 
fragment and ligated into Asp718/Sac\ restricted pSP2. Thus the adjacent kanamycin 
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resistance cassette and C. albicans ADE2 gene are flanked by pUC19 on one side and 
pCal on the other. 

5 Transformations 

The C. albicans, C. maltosa and S. cerevisiae strains were all transformed using the 
method of Kelly et aL, (1988) with some modifications. A 50mL YPD culture was 
grown to an ODeoo of 0.7-1.3 After washing the cells in 1M sorbitol they were 
resuspended in 20mL SCE, 22|al fc-mercaptoethanol and 150|al of 1mg/mL zymolyase 

10 20T (Seikagaku Kogyo Co., Ltd, Tokyo), They were spheroplasted at 27°C until the 
ODeoo of 50|il of cells in 1mL of water showed a 50% drop compared to the 1M 
sorbitol reference. After washing the cells they were suspended in 1mL STC and 
incubated with the transforming DNA at room temperature for 10 minutes. 1mL of 
PEG solution was added and the cells were incubated at room temperature for a 

15 further 10 minutes. The cells were pelleted and recovered in 1ml_ of SOS at 27°C for 
90 minutes. This was then plated in an osmotically buffered overlay onto minimal 
media. Some incubation steps were performed at 37°C for the C. albicans and C. 
maltosa strains. 

20 

Plasmid Extraction from Yeast Strains 

50mL YNB cultures supplemented with histidine were inoculated with the 
transformants and incubated at either 27°C (S. cerevisiae) or 37°C ( C. maltosa). 
25 Confluent cultures were spun down and the pellet resuspended in 10mL 10mM Tris, 
50mM EDTA, pH 7.5. The cells were pelleted again and resuspended in 10mL 50mM 
EDTA, pH 9.5 and 200 \iL B-mercaptoethanol. After incubation for 15 minutes at 
room temperature the cells were pelleted again and resuspended in 10mL 1M sorbitol, 
100mM EDTA, pH 7.5 (SE). To this 50^iL 1mg/mL zymolyase 20T was added. After 
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90 minutes incubation at 37°C the cells were pelleted. The pellet was resuspended in 
10mL SE with 100|xL 10mg/mL pronase and 1mL 10% SDS. This was incubated at 
37°C for 60 minutes. This was then extracted with an equal volume of 
phenokchloroform (1:1) twice. Two volumes of 95% ethanol was added and the 
5 precipitate spun down. The DNA pellet was resuspended in 100|j,L TE. This was 
transformed into £ coli from which transformants containing the yeast plasmid were 
purified according to Maniatis et al., (1982). 

Results 

10 

Site Directed Mutagenesis 

Using a plasmid containing the 5' end of the retrotransposon (p45E*) as the template, 
two PCR reactions were performed. One used the universal forward primer and CalL 
and the other the reverse primer and CalR. Each of the resultant PCR products were 

15 gel purified. The purified universal/CalL product was digested with Eco Rl and Nsi\ and 
the reverse/CaIR product was digested with BamH\ and Afe/1. The digested fragments 
were ligated into EcoRMBamYW restricted pUC19. The resulting plasmid, pNsi, 
contained the Nsi\ restriction site as confirmed by restriction analysis. Sequencing of 
pNsi confirmed that there were no other changes. The A at position 6135 of pCAL 

20 was changed to a T, resulting in the change ATGCAA to ATGCAT. 



In Vitro Plasmid Construction 

25 The construction of pRPU3 was achieved by conventional cloning methods. The 
intermediate constructs were confirmed as being correct by restriction analysis. 
Steps in which portions of the new plasmid were derived from PCR products or steps 
where the insert was replacing a fragment of similar size, were verified as being the 
desired product by sequencing the relevant region. Similarly the intermediates and 
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final products in the construction of pTIM2, p36f4UX1LC, pNRE5 and pSPKanAde2 
were analysed by restriction analysis and sequencing where appropriate. 

Transformation of S. cerevisiae, C. maltosa and C. albicans 

5 The three yeast strains transformed in this work represent the species from which the 
retrotransposon was isolated C. albicans (SGY269), a closely related species C. 
maltosa (CHAU1) and a more distantly related species S. cerevisiae (GSY1 12). Each 
of these yeasts were transformed with the newly constructed plasmid, pRPU3, and a 
plasmid known to transform efficiently both C. albicans and S. cerevisiae r pRC2312 

10 (Jenkinson et al., 1988). The relative numbers of transformed cells per (ig of 
transforming DNA are shown in Figure 52. The efficiency of transformation was 
determined for each of the yeasts. One pRPU3 transformant was found for every 
400 viable cells in each of the strains. There was more variation in the pRC2312 
transformations ranging from 1/1300 successfully transformed cells for S. cerevisiae 

15 GSY112 down to 1/10 000 for C. albicans SGY269. The successful expression of 
the URA3 gene required the transcription termination signals from the right LTR of the 
retroelement. These results suggest that the signals for transcription termination are 
present in the LTR and function effectively in all three yeasts. 

20 In addition the C. maltosa strain CHAU1 was transformed with pTIM2 and linearised 
pNRE5. When the cells were plated onto minimal media supplemented with histidine 
they required either exogenous adenine and uridine or the plasmids carrying the genes 
which enabled the cell to make these products. The URA3 gene was carried on the 
plasmid pTIM3 and this plasmid could stably maintain itself as it contained the CARS 

25 from pRC2312. The ADE1 gene however is carried on a plasmid that is not only 
linearised and hence unstable in yeasts, but also has no CARS and as such cannot be 
maintained as an independent DNA molecule in the cell. The ways in which a cell 
transformed with pTIM2 can survive on histidine supplemented media include 
recombining the plasmids with each other, recombining the linear DNA into the 
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genome such that it is maintained by the hosts origins of replication, or alternatively 
have the adenine and uridine auxotrophies revert. Transformants were obtained that 
were able to survive on the histidine supplemented media. All of the transformants 
when purified onto complete media and grown overnight in 50mL YEP media lost the 
5 ability to grow on media lacking uridine or adenine. This indicates that the function 
of prototrophy was carried by one plasmid which was lost when its maintenance was 
not required. The natural promoter signals and the transcription termination signals 
for the ADE 1 gene are contained within the plasmid outside the retroelements LTRs. 
The URA3 gene in both pT!M2 and p36f4UX1LC (used in the S, cerevisiae in vivo 
10 transformation) is not driven off its own promoter as it is in pRPU3. It is driven by 
the promoter signals in the left LTR and in p36f4UX1LC it is part of a fusion 
construction with the gag gene of pCAL. 

The cerevisiae strain MIB1 was transformed with the linearised pSPKanAde2 and 
15 p36f4UX1LC. As with the C. maltosa transformation described above, the linearised 
DNA must recombine with genomic DNA or with a plasmid carrying an origin of 
replication in order to complement both auxotrophies. pSPKanAde2 has extensive 
homology to the pCAL and pUC19 portions of p36f4UX1LC which allows preferential 
recombination with between the plasmids rather than illegitimate recombination into 
20 the chromosomes. p36f4UX1LC transformants were obtained on media 
supplemented with adenine. Similar numbers were obtained from a 
p36f4UX1 LC/pSPKanAde2 transformation on adenine supplemented media. Of these 
transformants up to 10% were also able to grow on minimal media, indicating that 
the in vivo recombination occurs with some efficiency even without selection. Growth 
25 of these transformants on complete media results in the inability to grow on media 
lacking either uridine or adenine indicating that recombination has occurred between 
the plasmids. 

In Vivo Plasmid Construction 



75 



tjk0879 



PATENT 
674521-2001.1 

The Ade1/Ura3l auxotrophic yeast C. maltosa CHAU1 was transformed with pTIM2 
and linearised pNRE5. pTIM2 contains the Candida Autonomously Replicating 
Sequence (CARS) and the URA3 gene, and as such is maintained in UraZ auxotrophic 
yeasts as a multi-copy plasmid. pNRE5 will complement the Ade1 auxotrophy but is 
5 unable to maintain itself as an independent element. To confer the functionality of 
the gene it must recombine with some other DNA that is stably maintained. After 
selecting transformants that were able to complement both auxotrophies we 
passaged colonies on complete media and repurified them on media lacking adenine 
and/or uridine. The colonies were unable to grow under these conditions indicating 

10 that the function conferred by the ADE1 and URA3 genes was found on a plasmid or 
plasmids. Genomic DNA preparations were performed and plasmids rescued by E. 
coli transformation. The plasmids were selected for their ability to confer resistance 
to kanamycin and replica plating showed that they were ampicillin resistant. Plasmid 
preparations showed that there was only one plasmid and that it was larger than 

15 either of the parental plasmids, pNRE5 or pTIM2. Restriction analysis showed that 
this new plasmid contained restriction fragments unique to each of the parental 
plasmids and hence was a chimera of the two. 

Similarly the Ade2/Ura3 auxotrophic yeast S. cerevisiae MIB1 was transformed with a 
20 plasmid containing a CARS and the C. albicans URA3 gene, p36f4UX1LC, and a 
linearised plasmid containing the C. albicans ADE2 gene, pSPKanAde2. 
Transformants were selected that complemented the Ura3 auxotrophy and were 
subsequently purified onto medium lacking adenine. About 10% of the transformants 
that grew on the medium lacking uridine also grew on medium lacking adenine. After 
25 plating the cells on complete medium they lost their ability to grow on media lacking 
adenine and/or uridine indicating that this ability was conferred by plasmid DNA. 
Genomic DNA preparations from these cells were made and the plasmids rescued by 
E. coli transformation. Plasmids were selected for their ability to confer resistance to 
both ampicillin and kanamycin. Plasmid preparations showed that there was a single 
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plasmid larger than either parental plasmid. Restriction analysis showed that the new 
plasmid, contained restriction fragments unique to both p36f4UX1LC and 
pSPKanAde2. 

5 DISCUSSION 

Transformations 

By constructing and transforming plasmids with different features we have been able 
to demonstrate that the new C. albicans retrotransposon like element, pCAL, contains 

10 promoter and transcription termination signals. In the plasmid pRPU3, a marker gene, 
URA3, was ligated into the 3' end of the pol gene of pCAL The URA3 gene 
contained its own promoter sequence but no transcription termination signals. Thus 
to be successfully expressed when transformed into the yeasts a message could be 
driven off either its own promoter or that of the retroelement, but it was reliant on 

15 the polyadenylation signal in the right LTR to terminate transcription. The successful 
transformation of three Ura3 auxotrophs, C. albicans SGY269, 

C. maltosa CHAU1 and S. cerevisiae GSY112, indicates that not only is the 
polyadenylation signal functional in the host species but that it works in at least two 
other yeast species. pTIM2 and p36f4UX1LC also contain the C, albicans URA3 

20 gene, however neither of these plasmids contain the URA3 promoter sequence. 
pTIM2 has the left LTR and non-coding sequence of pCAL immediately upstream of 
the URA3 gene while p36f4UX1LC has the URA3 gene as a fusion product with the 
gag gene of pCAL. pTIM2 and p36f4UX1LC where shown to function in C. maltosa 
CHAU1 and S. cerevisiae MIB1 respectively. In addition they both function in C. 

25 albicans (data not shown). 

In Vivo Recombination 

We report the in vivo recombination of two plasmids in both S. cerevisiae and C. 
maltosa as a method for constructing plasmids too large to be easily constructed in E. 
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coli or for constructing plasmids where there are no unique restriction sites available. 
Selection of recombinant plasmids only requires that one plasmid contain a 
autonomously replicating sequence and that the other plasmid contains a selectable 
marker. As both of the plasmids are reliant on each other for expression and 
5 maintenance there is positive selection for legitimate recombination. In the C, 
maltosa CHAU1 transformation the homology between the pUC19 derived portions of 
pNRE5 and pTIM2 was used to direct recombination. 

The MIB1 transformation results show that recombination occurs without selection in 
10 up to 10% of the transformants. This is significant because it suggests that the 
recombination machinery preferentially associates with naked DNA rather than 
chromosomal DNA. 

The plasmids constructed by in vivo recombination are potentially useful for the 
1 5 analysis of the frequency of transposition under various conditions. By including a 
marker gene (URA3) within the LTRs and one external to the LTRs (ADE) of a 
complete retrotransposon or a functional portion of it, the frequency of transposition 
can be determined by analysing the preparation of cells which maintain prototrophy 
after growth on complete media. The majority of cells will lose the functionality with 
20 plasmid loss. Others will become prototrophic for one or both of the defects due 
either to retrotranspositoin or recombination. Transposition will integrate everything 
between the LTRs including the URA3 gene. These colonies will be auxotrohpic for 
adenine and prototrophic for uridine. Recombination between homologous regions of 
the plasmid and the genome (such as the LTRs or the marker genes) will result in the 
25 incorporation of plasmid information from both within the LTRs and outside of them. 
The resulting colony would be prototrophic for both adenine and uridine. The 
possibility of reversion of the phenotypic markers becomes increasingly important 
when analysing rare events such as retrotransposition. Where transposition occurs 
there will be an increase in the number of LTRs which can be detected by Southerns, 
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whereas reversion of the phenotypic markers will result in no increase in LTR 
numbers. 

The presence of a strong promoter within the LTRs is not repressing expression of the 
5 adjacent URA3 promoter. Such repression has been encountered in other systems 
extending over several kilobases (the "Temin" effect). The most effective way to use 
selection is to have the prototrophic gene (such as URA3) placed on its own promoter 
backwards with respect to the retrotransposon (adjacent to the 3'UTR). The 
prototrophic gene is disrupted with an intron which is aligned forwards with respect 
10 to the retrotransposon. In this situation the URA3 gene is non-functional (because of 
the intron) unless the whole element has been transcribed, the intron removed and 
the retrotranscript reintegrated. In other words all the URA transformants are due to 

«P retrotransposition (rather than say random integration of the plasmid). This is the 

\ = i 

system used in Saccharomyces and Schizosaccharomyces. 

? 15 

f§ Taken together we believe that these results demonstrate that TCa2 is an active 

f. retrotransposon. This is further supported by the observation that the Southern 

O pattern of strains differs - suggesting an active retrotransposon. If TCa2 is active it 

follows naturally that it should function to disrupt genes at the new integration site, 
d 20 The pRPU3 results indicate that TCa2 can be 'tagged 7 with a URA3 gene expressed 

from its own promoter. 

EXAMPLE 15 

25 Use of TCa2 as an expression system and as a transformation system 

We have demonstrated that there is a very strong, temperature regulated promoter in 
the LTRs of TCa2. This is established by the abundant RNA as measured by northern 
blots. This is of considerable value as there is no other strong inducible promoter in 
Candida. Most genes from S. cerevisiae do not function in Candida and this is 
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probably due to a promoter specificity (the reverse does not hold, most Candida 
genes do work in S. cerevisiae). This means that one can not use the S, cerevisiae 
expression systems in Candida. In addition we have demonstrated that the LTR 
promoter will work in Candida by placing a Candida URA3 gene in phase and adjacent 
5 (just 5') to the initiator methionine of ORF1. Such plasmids (pTIM1/2) function in 
Candida and confer URA3 prototrophy on Ura- auxotrophs. This establishes that the 
promoter is working. Such transformations are, we think, reasonably efficient and 
the transformants are reasonably stable. A curious and interesting observation may 
explain this. Strains transformed with pTIM plasmids show an obvious band on 

10 agarose gels. This DNA is not pTIM. It does not hybridise with TCa2. It is in fact 
circular extrachromosomal copies of the ribosomal repeat element. The Candida 
replication origin used in pTIM is called CARS. It was derived from Candida. It is a 
part of the ribosomal repeat structure. We believe that the abundant RNA transcribed 
from the LTR promoter in pTIM (and similar) is resulting in the cell 'up regulating' the 

1 5 ribosomal system by producing free circular replicating rDNA plasmids. This would 
explain the circular DNA in pTIM transformants. If the upregulation is also acting on 
the CARS element carried by pTIM then the system will up regulate itself in a positive 
feed back loop. That is to say; the LTR driven RNA transcription up regulates the 
pTIM CARS which results in more replication of pTIM and more copies of pTIM. This 

20 will result in more transcripts from the LTRs and therefore even greater up regulation 
of pTIM. The bottom line is you get an efficient transformation and stable (more or 
less) transformants. 

EXAMPLE 16 

25 Use of a pCal construct to induce random mutagenesis 

In order to 'tag' the retrotransposon the intention was to use an inverted ('back to 
front') intron inserted within a reporter gene (URA3). Such an inverted intron would 
prevent URA3 phenotypic function unless the intron is removed from the transcript. 
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pURA UR i Inverted lntron)A3 / oFtet 

/ DNA 



Initial DNA construct. 



10 

UR ( Inverted Intron} A3 



Transcript from URA3 promoter (pURA). 



1 5 The transcript is not able to code for the URA3 gene product because the intron 
cannot be removed (it is in backwards). 



pURA UR ( Inverted lntron)A3 I 

20 (before splicing) 

p URA UR A3 / 

^ (after splicing) 

25 Transcript (before and after splicing) from Retrotransposon promoter (pRet). 

The transcript is not able to code for the URA3 gene product because, although the 
intron can be removed (processed or spliced), the URA3 sequence is backwards. 

pURA UR A3 / 
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______ DNA 

Integrated spliced construct. 

Reverse transcriptase/integrase functions of the retrotransposon may act on the 
5 spliced pRet transcript converting it to a double stranded integrated DNA. Once 
integrated the copy in the genome will provide a functional pURA3. 

There is no experimental work on introns in Candida. So we selected one possible 
candidate, the very small intron (mini-intron) from the peptide transporter gene (Basrai 
10 et al 1995). This was amplified by PGR and inserted into the URA3 gene in both the 
forward and backward direction. The forward was a control to make sure the peptide 
^ transporter intron would splice. As expected, it did. 

S dURA UR f lntron)A3 . ,____, 

U\ 15 DNA 

p URA3 gene with forward Intron 

h UR f lntron>A3 

! X Initial Transcript 

*D 20 

W URA3 

Spliced transcript ^ 



25 Again, as expected, the backward intron failed to splice, even though it was the 
identical sequence put into the identical URA3 site. 

dURA UR f Inverted lntron)A3 

DNA 
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UR { Inverted lntron)A3 ^ 

Initial Transcript (cannot be spliced) 

5 

We have now mounted this URA3/inverted intron element onto a retrotransposon 
plasmid putting the element into a (synthetic) Nsil site at the 3' end of the coding 
sequence. We have also added an ADE2 element between the right LTR and the 
Candida ARS (CARS), This is summarised below. 



Left LTR 

Left LTR J L 



15 




In theory the retrotransposon will transcribe from the left LTR to the right LTR, the 
transcript will have the intron spliced out and the spliced elements- will be converted 
into DNA by reverse transcriptase and integrated. The URA3 element will then be 

25 transcribed off its own promoter to give a URA + phenotype. There are possible 
problems to do with the pURA3 element interfering with transcription of the 
retrotransposon or the reverse transcriptase but these can only be found, and fixed 
empirically. The ADE2 was added to the plasmid to give positive selection (as the 
URA3/intron is non-functional in the plasmid). 
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The plasmid is quite large and therefore not that easy to work with but it has been 

completed. The plasmid has been transformed into two ADE2 URA strains, one 
carrying a URA3 point mutation and the other a URA3 deletion (a small deletion). 

5 ADE2 + transformants were selected and grown at 37°C to encourage 
retrotransposition. Cultures were then plated on minimal medium + adenine. The 

plasmid is lost under these conditions and only URA + variants (retrotranspositions?) 

can grow. Both strains gave URA 4 " derivatives. The URA" point mutation is 

reasonably stable and the URA" deletion completely so. We, therefore, are sure that 

10 these URA + variants are not revertants. They are, we believe, a mixture of 
retrotransposition and gene conversion. There is very little literature on gene 
conversion in Candida. 



1 5 Left LTR 



20 



CARS 




Retrotransposon 



UR/intron/A3 



25 



URAA3 



URA3A(Deletion) Homozygote 



URA A3 



DNA 



with characteristic ASouthern pattern 
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URA3 + 



URA3 allele due to gene conversion 



5 Gene conversion between the URA3 and the URA3/myerted__/-intron allele can 
generate a URA + allele that will have the wild-type Southern allele pattern. 

The URA3+ colonies generated in these experiments were analysed by Southern 
analysis to confirm the presence of a new copy of TCa2 containing the URA3 + gene 
10 (Figure 55). The URA3 + colonies derived from L11051R all appear to contain the 
same putative retrotransposition event. The clones derived from L1 963R appear to 
contain different events, since several different sized bands are observed. However, 
some of the URA3+ colonies appear not to contain extra bands. 

1 5 Some of the URA + variants are clearly due to gene conversion. Some are clearly not 
due to gene conversion. They give new and various bands which we think indicates 
retrotransposition into random sites. 

EXAMPLE 17 
20 Further analysis of URA3 + 

We have done further analysis of the URA 3 + strains thought to be carrying a new 
retrotransposition (URA3+ and having 'unusual 7 Southerns when probed with a URA3 
probe) (Figure 55). 

Specifically we have done 'inverse PCR' (IPCR) after a Taql (4base cutter) digest of 
25 the DNA and self-ligation. The IPCR primers correspond to: 

i) the URA3 gene (interrupted by the peptide transporter intron); and 

ii) the boundary of the URA3 and TCa2 LTR. 
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These should only give a product following a retrotransposition event since the intron 
must be removed before primer i) will work. 




15 

Integrated into genome following retrotransposition 



URA3 LTR 
*Taql I 1 — I | 




25 

The inverse PCR products have been sequenced from several independent URA3 + 
and the sequence confirms that there has been a retrotransposition (the intron has 
gone) and that there is an additional retrotransposon integrated into a novel site in the 
genome. 
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In summary the system works. So far all the integrations are in different sites. 
Results are shown in Figures 59-62. 

5 

The ABI PRISM DNA sequence chromatograms of Figures 59 and 60 show that the 
URA+ tagged retrotransposon has undergone retrotransposition and integrated into a 
new site in the Candida genome. In other words it is an actual example of a random 
tagged integration/mutation event. 
10 Specifically: 

DNA was isolated from the URA+ C.albicans, digested with the restriction enzyme 
Taql, self-ligated and subjected to inverse PCR. The resulting PCR product was 
cloned and sequenced from the 'universal' forward and reverse primers. 
The sequence H963RU59 defines the exact integration site of the retrotransposon. 
15 This integration site falls within the ORF of a membrane protein. This is not a unique 
event, the table (Figure 58) describes other integration events. 

These integration sites do not seem to be associated with tRNA genes or LTR 
sequences from Tca2 or other retrotransposons. The integrations seem to have 
occurred at a wide variety of sites. The integration site sequences show no obvious 

20 homology to each other. In as far as a generalisation can be made on the present 
data, the Tca2 integrase seems to prefer to integrate near to the 5 end of coding 
sequences (ORFs). This may be within the ORF (as in strain H963RU59) or within 
several hundred base pairs 5 to the ORF. Such integration will potentially inactivate 
the ORF expression, down-regulate or up-regulate the ORF expression or alter the 

25 regulation of expression (for example, make expression of the ORF 
temperature sensitive). 

This pattern of integration is unlike that of any previously described retrotransposon 
integrase. For example, in Saccharomyces cerevisiae Ty1 , Ty2, Ty3 and Ty4 integrate 
near tRNA sites, while Ty5 integrates into telomeric DNA. The Tca2 integration 
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pattern is unlike those integrases previously described and therefore could not be 
predicted. The use of Tca2 as a random integration system is therefore a non-obvious 
application of this retrotransposon. 

5 EXAMPLE 18 

Evidence of temperature dependent retrotransposition 

Strain hOG 1042, which contains TCa2, was grown in liquid culture (Yeast extract, 
10 Peptone, Glucose) at 37°C. Serial subcultures were made every day for 3 weeks. A 
number of single colonies from this liquid culture were isolated on solid medium and 
DNA extracted from them. These DNA samples were included in a Southern analysis, 
where the probe used would hybridise to the 3' region of the POL gene of the 
integrated retrotransposon. 

15 

The results of this Southern (Figure 16) indicate the presence of one or more new 
bands in many of the strains cultured over the 3 weeks as compared to the original 
hOG 1042. It is assumed that these new bands represent the presence of TCa2 
integrated at new genomic loci. This implies that TCa2 has actively retrotransposed to 
20 generate new copies of itself at new positions. The size of the new band(s) vary from 
strain to strain, indicating that the new integration sites are different in each individual 
strain. 

EXAMPLE 19 

25 

Vector construction 

The initial phase of the project involved the construction of a vector that could be 
used to characterise retrotransposition events in C. albicans. This vector contains the 
30 retrotransposon TCa2 and a selectable marker gene with an intron inserted. The 
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URA3 gene, from C. albicans, was chosen as the selectable marker. Since the URA3 
gene does not contain a native intron, a small intron from a C. albicans peptide 
transporter gene was used. 

5 Insertion of an intron into the ura3 gene 

The intron of the peptide transporter gene was inserted into the URA3 gene, close to 
the start of the open reading frame (ORF). This location was used since most C. 
albicans introns are located near the front of ORFs. The URA3 gene used contains 

10 only a short promoter region (130 bp) and no transcription termination signal so as 
not to interfere with the transcription of the retrotransposon. The intron was inserted 
in both forward and reverse orientations (with respect to the URA3 gene) to allow 
analysis of the intron processing. The intron-containing URA3 gene was placed into 
TCa2 near the end of the po/gene, in both orientations (Figure 59). 

15 It was found that the URA3 gene in these constructs was functional only when the 
intron was placed in the normal orientation with respect to the URA3 gene. In 
addition the URA3 gene was functional in either orientation with respect to TCa2. 
Therefore the intron is capable of being processed correctly. 

A construct was then produced which contains the URA3 gene in the reverse 
20 orientation with respect to TCa2 and an intron inserted into this gene in the forward 
direction with respect to TCa2. In addition an ADE2 gene and Candida autonomously 
replicating sequence (CARS) were also present on this vector. The resultant vector 
was transformed into an ura3~ ade2~ C. albicans strain (hOG963). Transformants were 
selected using the ADE2 marker, Transformants were grown overnight in minimal 
25 media supplemented with uridine and then plated on minimal media containing 
adenine but lacking uridine. If retrotransposition had occurred then URA3 + colonies 
would be produced as a result of splicing of the reverse intron from the URA3 gene 
and therefore restoration of a functional gene (Figure 63). Several such colonies were 
produced, however they all appeared to be the result of gene conversion of the 
30 plasmid borne URA3 gene with the native URA3 gene. It was therefore decided to 
integrate the vector in the hope that this would reduce the frequency of gene 
conversion. 
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Integration of the retrotransposition vector 

The CARS from the plasmid used in the previous analysis was removed resulting in 
the plasmid pRUIA (Figure 60). This plasmid was digested at the unique Xba I site 
5 (within the ADE2 gene) and transformed into two ura3~ ade2 strains of C. albicans, 
hOG963 and hOG1051, giving rise to the strains H963R and H1051R, respectively. A 
schematic diagram of the integration is shown in Figure 60. Southern analysis of 
strains containing the integrated pRUIA is shown in Figure 61. 

10 

Expression of the tagged TCa2 is temperature sensitive 

It is known that in some C. albicans strains (for example hOG1051) TCa2 is 
15 expressed at higher levels at 37 C, as compared to cultures grown at 27 C. To 
insure that the full tagged TCa2 was being expressed Northern analysis was 
performed (Figure 62). Results of this analysis indicate that the TCa2 construct 
containing the URA3 gene is expressed as one long transcript. 

20 Retrotransposition in C. albicans 

The strains H1051R and H963R (containing the integrated pRUIA) were used to 
analyse retrotransposition of TCa2. Since retrotransposition occurs via a mRNA 
intermediate the intron inserted into the URA3 gene can be processed before reverse 
25 transcription of TCa2. The double stranded DNA copy of the retrotransposon is then 
integrated into the host genome. Since the URA3 gene has had the intron removed it 
can produce a functional protein. A diagram of this process is shown in Figure 63. 

C. albicans strains containing integrated pRUIA (H1051R and H963R) were grown 
30 overnight in rich medium (YPD) then plated on minimal media. If retrotransposition has 
occurred then URA3 + colonies are produced. An example of a typical experiment is 
shown in Figure 64. 
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The strain hOG1051 is known to overexpress TCa2 (Figure 62). The derivative 
H1051R gave rise to approximately 10-fold more URA3 + colonies than H963R. The 
estimated rate of URA3+ production for H1051R is approximately 10~ 5 URA3 + 
colonies/cell plated. 

The URA3 + colonies generated in these experiments were analysed by Southern 
analysis to confirm the presence of a new copy of the TCa2 containing the URA3+ 
gene (Figure 65). 

The URA3 + colonies derived from H1051R all appear to contain the same putative 
retrotransposition event. The clones derived from H963R appear to contain different 
events, since several different sized bands are observed. However, some of the 
URA3* colonies appear not to contain extra bands. 



Inverse PCR of tagged retrotranspositions 

In order to analyse the putative retrotransposition events further inverse PCR was 
used to determine the sequence flanking the 3' end of the tagged TCa2. A PCR 
primer was designed to the boundary of the URA3/TCa2 and another primer to the 
site of intron insertion in the URA3 gene. These two primers are specific for the 
tagged retrotransposon, since the URA3/TCa2 boundary is unique to the integrated 
vector and any retrotransposon insertions which result. The second primer requires 
that the intron is removed, thereby only allowing the generation of PCR products from 
retrotransposition events. Primers were designed so inverse PCR could be performed 
with the restriction enzymes Taq I or NIa III. Both of these enzymes have a four base 
pair recognition sequences. It was expected that this would allow inverse PCR of any 
integration events since these enzymes cut frequently in the genome. A schematic 
diagram of the inverse PCR strategy is shown in Figure 66. 

Initially the inverse PCR products were cloned and sequenced, however once the 
inverse PCR was optimised the PCR products could be directly sequenced. 

Analysis of insertion sites of the tagged TCa2 



91 



tjk0879 



PATENT 
674521-2001.1 

Analysis of the Ty retrotransposons of S. cerev/s/ae indicates the presence of some 
target site specificity. Ty3, for example integrates 1-4 nucleotides from the start site 
of RNA polymerase III transcription start sites; Ty1 integrates close to tRNA genes 
while Ty5 inserts near telomeres. Although Ty1 tends to integrate close to tRNA 
5 genes, insertions into coding sequences have also been observed. From analysis of 
pre-existing TCa2 insertions in the public database a target-site preference similar to 
those of the Ty elements is not observed. Instead, the data suggest that TCa2 has a 
preference for inserting into the noncoding DNA adjacent to ORFs. 
Analysis of tagged TCa2 retrotranspositions reveals the occurrence of two main types 

10 of events, in this system. Insertion site sequences obtained from URA3 + colonies of 
H1051R all appear to be the result of homologous recombination with TCa2 LTRs. 
The parental strain of H 1051 R is known to contain an abundance of TCa2 linear DNA, 
it is possible therefore that homologous recombination is occurring since there may 
not be a sufficient level of the retrotransposon integrase. 

15 Target site sequences obtained from H963R URA3 + colonies again show some events 
which appear to be the result of homologous recombination into LTRs, however these 
account for only about 40% of the events analysed. It should be noted that the 
proportion of recombination events appears to vary between experiments. The 
remainder of sequences analysed have target site sequences not previously found 

20 next to TCa2 elements; these events are thought to be genuine retrotransposition 
events. To date the genomic location of 14 insertions have been determined by 
comparison of the flanking sequences with the assembled C. albicans genomic 
sequence from the Stanford sequencing project. In addition one insertion was found in 
a repeat sequence, and three other insertions could not be assigned to a contig 

25 because the sequence obtained was too short, or that region had not been 
sequenced. These sequences have not been included in the analysis presented. 
Open reading frame maps of the regions flanking the TCa2 insertions are shown in 
Figure 67. With the exception of one insertion into a gene (H963RU59) all other 
events are in the intergenic regions between ORFs. No evidence could be seen for an 

30 association with tRNAs or RNA polymerase III transcription sequences, as is seen for 
Ty1 and Ty3. 

In order to determine the target site preference of TCa2 various analyses have been 

performed. There appears to be a strong preference for intergenic regions. Figure 68 
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shows the distribution of insertion sites in relation to the nearest ORF. This may be 
the result of integration occurring via an interaction with transcription factors. If this 
were the case then it would be expected that there would be a preference for the 
control regions of promoters. In support of this argument most insertions are closest 
5 to the 5' end of ORFs, rather than in the 3' region (Figure 67). 

An attempt was made to determine if there is any sequence specificity for the 
insertion site. A region 500 bp either side of the insertion site was analysed for 
sequence patterns, however no consistent pattern was observed, indicating that there 
is no absolute sequence specificity of the TCa2 integrase. The only sequence pattern 
10 that could be determined for the integration site is a preference for AT rich sequences 
close to the insertion site (Figure 69), however this observation may be biased by the 
AT richness of C. albicans intergenic sequences, 
y These findings are consistent with the above proposal that TCa2 integration sites are 

^ determined by the distribution of transcription factors, rather than by the integrase 

3 1 5 interacting directly in a sequence-specific manner with the target site DNA. 

s 

y 20 Removal of marker genes following retrotransposition 

In order to construct further gene disruptions in strains which have undergone tagged 
^ retrotransposition it would be desirable to have selectable marker in these strains. 

Both the ADE2 and URA3 genes used as markers in these experiments are able to be 

25 removed, allowing reuse of these markers. Removal of the URA3 gene should be 
possible through homologous recombination between the LTR sequences. Such an 
event should result in the presence of a single LTR (solo LTR) at the site of insertion 
(Figure 70). It has been demonstrated in one of the H963R URA3 + strains that the 
URA3 gene can be removed by selection with 5-fluoroorotic acid (5-FOA). Analysis of 

30 these ura3~ revertants is currently in progress. 

In a similar way recombination between ADE2 genes surrounding the integrated 
pRUIA results in the loss of the vector. These cells are now ade2 and can be selected 
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by their red colour on selective media. This event can be seen in H963RU1 (Figure 
65). Note the loss of the band which corresponds to the integrated pRUIA. 

5 DISCUSSION 

Analysis of the complete 6426bp sequence of pCal revealed that it is a free (i.e. 

unintegrated), double-stranded DNA form of a new retrotransposon belonging to the 

Ty Vcopia group. Initially, no significant similarity at the nucleotide level was found 
10 between pCal and any other sequence in the databases. This was not considered 

surprising, however, because reverse transcriptase has no editing function, so reverse 
m transcriptase-based elements have a higher mutation rate than those utilising other 

polymerases. A more appropriate and useful analysis was to look for the conserved 
I a functional motifs expected to be present. Such areas have tight evolutionary 

y 1 5 constraints and are often similar, even in highly divergent elements such as cop/a and 

9YPsy~ A close examination of the sequence revealed that pCal has many of the 

features commonly found in retrotransposons. Such features include the 280bp long 
\& terminal direct repeats (LTRs) with short inverted repeats and putative transcriptional 

IT* initiation and termination signals, a (-)PBS adjacent to the left LTR, a PPT adjacent to 

%Q 20 the right LTR and two long ORFs, the first similar in size and position to the gag ORFs 

of other retroelements and the second containing motifs homologous to pol ORFs. 

Within the gag ORF of pCal no nucleic acid binding motif could be identified. A 

CX2CX4HX4C nucleic acid binding motif is found within the gag ORF of some 

retrotransposons of the TyMcopia group, for example Ta1, copia, 1731 and Tp1. 
25 However, this motif is not found in the functional retrotransposon Ty1. Taken 

together, all the features required for retrotransposition appear to be intact in pCal 

suggesting that it is likely to be a functional retrotransposon. 

The order of the motifs within the pol gene of pCal (protease - integrase - reverse 
30 transcriptase - RNase H) suggests that pCal is a member of the TyMcopia group. In 
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agreement with this a phylogenetic analysis, based on the reverse transcriptase genes 
of a diverse range of retroelements, also placed pCal within the lyMcopia group 
(Figure 12). This analysis, however, also revealed that pCal has no close relatives 
within the known set of Ty1 /cop/a retrotransposons: pCal was placed as the most 
5 divergent element in a large group of retrotransposons containing representatives from 
plants (Ta1, Tnt1, Hopscotch and Tst1), insects (cop/a and 1731), a green alga 
(Osser) and yeast (Ty5). It is probable that the reverse transcriptase of pCal is 
functional and so, therefore, this placement of pCal is probably a genuine reflection of 
the divergent nature of this element, rather than being the result of the unselected 
10 accumulation of random mutations. 

Within the LTRs of pCal there was no extended DNA sequence homology to the other 
C, albicans retroelements, TCa1 and beta. TCa1 and pCal do, however, share features 
such as similar inverted terminal repeats on their LTRs, a very similar PPT sequence 

15 and they potentially utilise the same tRNA Ar 9 fragment as a primer. The TCa1 (-)PBS 

complements nine nucleotides at the 3' end of the tRNA Ar 9 fragment (bases 31-39). 

The pCal (-)PBS complements eleven nucleotides of the tRNA Ar 9 fragment (bases 29- 
39) and, similarly to what has been found in Ty1, Ty2 and Ty3, pCal has an 
additional sequence downstream of the (-)PBS which complements a further 6 bases 
20 ( 1 9-24) of the tRNA Ar 9 fragment. 

Given that pCal and TCa1 are believed to use an internal fragment of the tRNA Ar 9 
(nucleotides 1-39), it is of great interest to note that the retrotransposon cop/a uses 

the first 39 nucleotides of tRNA iMet as a primer. It is not clear if the fragment is the 
25 result of normal tRNA degradation. The cop/a primer may be a product of 

'hyperprocessing' of tRNA iMet by Drosophila RNase P. Hyperprocessing was defined 
as the processing of a mature tRNA to produce another functional RNA molecule, 
although, to date, the only assigned function of these tRNA fragments is as primers 
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for retrotransposons. The RNA component of E. coli RNase P was shown to cleave a 
number of sites in the tRNA iMet , one of these being between nucleotides 39 and 40. 
The Drosophila tRNA'Met and yeast tRNAArg3 have a very similar physical structure 
in terms of numbers and positions of loops and stems, residues in each loop, number 
5 of base pairs in each stem and total number of nucleotides in the tRNA. It is therefore 
possible that a similar hyperprocessing reaction is occurring with a tRNA^rg j n c. 
albicans to produce the primers for pCal and TCa1 . 

If pCal is using a tRNA fragment for priming, there are implications for control of 
10 replication. An element using a whole tRNA as a primer has a pool of normal, 
functional tRNAs to draw on, even if the tRNA in question is a rare one. Elements 
using a fragment, however, have to contend with the stability of tRNAs and the 
possibility that once a tRNA starts degrading, it may be rapidly further degraded. The 
elements using a fragment as a primer will have to bind the tRNA after only partial 
15 degradation. This process could be a limiting step in the reverse transcription process 
and consequently control copy number of pCal. 

Most retrotransposons and retroviruses have been found to have their gag and pol 
ORFs lying in different phases on the mRNA. The necessary down-regulation of the 

20 pol gene with respect to the gag gene is thus brought about by the fairly low 
frequency of ribosomal frameshifting from the gag reading frame to the pol reading 
frame. There are, however, exceptions to this finding. For instance, the gypsy-type 
retrotransposon Tf1 from Schizosaccharomyces pombe has its gag and pol ORFs 
fused into one long ORF. The gag and pol gene products are thus produced in equal 

25 amounts. The required excess of gag protein to pol enzyme is produced post- 
translationally, via an enhanced rate of degradation of the pol enzymes. Some insect 
and plant retrotransposons of the JyMcopia group, for example cop/a , Ta1 and Tnt1 
also have their gag and pol ORFs fused into one long ORF. In cop/a, at least, the 
down-regulation of pol occurs by the frequent splicing of the mRNA to remove most 
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of the pol ORF. The fact that the gag and pol ORFs of pCal are in the same phase 
implies that pCal is another retrotransposon that doesn't use frameshifting to down- 
regulate pol. Instead it seems likely that some form of stop codon suppression is 
required for translation of the pol ORF and this would also be likely to result in the 
5 down-regulation of pol relative to gag. It is therefore interesting to note that pCal has 
some structural similarities with mammalian type C retroviruses, such as Moloney 
murine leukemia virus (MMLV), in the vicinity of the gaglpol boundary. In MMLV a 
UAG stop codon which separates the gag and pol ORFs is suppressed with an 
efficiency of about 5%, being translated as glutamine. An 8bp purine-rich sequence 

10 immediately 3' to the stop codon and an adjacent pseudoknot structure are both 
necessary and sufficient for stop codon suppression. Mutations disrupting the stems 
of the pseudoknot impaired suppression and compensatory mutations restored 
suppression. Also the sequence of the purine-rich tract between the stop codon and 
the pseudoknot was found to be critical and it is likely that the length of this 

15 sequence is important. The MMLV read-through mechanism is not yet fully 
understood, but a pseudoknot-induced ribosomal pause at the suppressed UAG codon 
is likely to be involved. Similarly to MMLV, pCal has an 8bp purine-rich sequence 
immediately 3' to the UGA stop codon, although not the same sequence as in MMLV, 
and it has a putative pseudoknot (Figure 5). There is only the 8bp purine-rich 

20 sequence between the termination signal and the start of the putative pseudoknot. It 
is therefore likely that a similar form of read-through suppression is occurring in pCal 
and MMLV. 

It has been reported that C. albicans and some other closely related Candida species 
25 contain a tRNA capable of suppressing UAG and UGA stop codons. This tRNA, 
tRNA SerCAG , was originally identified as being responsible for the translation of the 
universal CUG-leucine codon as serine in certain Candida species. The tRNA SerCAG 
has some unusual structural features and a recent report has even shown that 
tRNA SerCAG can be charged to a low degree (about 3%) with leucine and can 
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incorporate this leucine into proteins during translation. This is one of the first 
examples of the assignment of a single tRNA species to two amino acids. This 
strange tRNA was also implicated in some aberrant translational events. It was found 
that when C. albicans tRNAs were added to in vitro translation systems, proteins 
5 which migrated more slowly than expected on SDS-PAGE gels were produced. These 
results were interpreted as evidence that C. albicans contains a tRNA capable of 
suppressing UAG and UGA stop codons. The tRNA responsible for the unusual 
translational events has been identified as tRNASerCAG. However, results could not 
be simply explained by tRNA SerCAG being an omnipotent nonsense suppressor: The 
10 amino-terminal regions of proteins synthesised in the presence of tRNASerCAG a | so 
migrated more slowly than expected with SDS-PAGE. At present it is unclear what 

the actual effects of tRNA SerCAG are, aside from incorporation of serine at CUG 
codons. This leaves open the question of what molecule it is that mediates the 
suppression of the UGA termination codon at the gaglpol boundary of pCal. 
1 5 Sequencing the gag and the gag/pol fusion proteins and mutational analyses of the 
regions surrounding the stop codon could be used to determine the mechanism by 
which the pol genes of pCal are translated. 

The pCal system is producing much more free dsDNA - estimated at 50-100 copies 
20 per cell - than any other reported retrotransposon system. This is true even of the 
system in which Ty1 of S. cerevisiae is expressed off a high copy number plasmid 
under the control of the highly inducible GAL1 promoter. Such a GAL promoter 
system is capable of producing about 10 dsDNA copies per cell and the DNA requires 
Southern blotting before it can be detected. We have detected integrated 
25 retrotransposons, similar in sequence to pCal, which we have named TCa2. This 
integrated form has been detected in a diverse range of C. albicans strains. Extremely 
high levels of the free, linear, dsDNA form (pCal), however, have only been detected 
in hOG1042 and its close relatives (descendants of iB65). 
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Overall, pCal presents itself as a highly unusual retrotransposon. While having many 
of the features conserved among retrotransposons, it has a number of features which 
set it apart from other elements of its class. For instance, the translation of the pol 
ORF seems to be dependent upon the pseudoknot-assisted read-through of a UGA 
5 stop codon. This is similar to the mechanism used by mammalian type C retroviruses, 
but has not been previously reported in retrotransposons. A phylogenetic analysis of 
the reverse transcriptase sequences of a number of LTR-retroelements showed that, 
while pCal lies within the Ty1 Icopia group of retrotransposons, it is one of the most 
divergent elements within this group. The most distinctive feature of pCal, however, 
10 is that it exists at a high copy number as a free, linear, double-stranded DNA 
molecule. 



The TCa2 retrotransposon was originally discovered due to its appearance as an 
abundant, extrachromosomal DNA molecule in Candida albicans strain hOG1042. 
1 5 Sequence analysis of some clones of this extrachromosomal form of TCa2 (referred to 
as pCal) showed it to be basal member of the Ty1 Icopia class of retrotransposon. 
Here we have extended the characterization of this element to include an analysis of 
its integrated forms, and a comparison of the expression of its RNA and 
extrachromosomal DNA forms, in a variety of C. albicans strains. 

20 

An important finding to emerge from this work is that there is a large amount of 
variation amongst different C. albicans strains, in both the amount of TCa2 RNA and 
extrachromosomal pCal DNA produced, and in the genomic copy number of TCa2. It 
is of interest that the number of integrated copies of TCa2 in the different strains 
25 correlates with the amount of TCa2 RNA produced by each strain, and again, that the 
amount of TCa2 RNA in each strain is related to the amount of extrachromosomal 
pCal DNA. The greatest numbers of integrated copies of TCa2, 10 to 12, occur in the 
closely related strains hOG759 and hOG1042. About 5 copies are found in F16932, 
and the other strains examined, SGY269, SC5314, ATCC10261, and SA40 each 
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have 1 or 2 copies. The highest levels of TCa2 RNA are also found in hOG759 and 
hOG1042. The next highest level occurs in F16932, and the other four strains each 
have a relatively low level. The greatest amounts of pCal extrachromosomal DNA are, 
once again, found in hOG759 and hOG1042. Moderate levels of pCal are found in 
5 F16932, and also in SA40. Low levels occur in SGY269 and SC5314, and, lastly, no 
extrachromosomal copies of pCal, at all, were detected in ATCC10261. These 
correlations between genomic copy number and abundance of RNA, and between the 
abundance of RNA and the abundance of extrachromosomal DNA, suggest that a 
large amount of the variation seen among strains, in the amount of pCal DNA and 
10 TCa2 RNA that they produce, is simply a consequence of variations in the number of 
integrated copies. Or, to put this another way, the genomic copy number of TCa2 is a 
^ major determinant of TCa2 RNA levels, and the TCa2 RNA levels are a major 

*p determinant of pCal DNA levels. As mentioned in the results, however, the 

X correlations are not perfect which suggests that other factors are also involved. To 

m 1 5 reiterate: hOG759 and hOG1042 have roughly twice as many integrated TCa2 copies 
p as F16932 and ten times as many as the other four strains, yet they produce about 5 

* times and 50 to 1 00 times as much RNA, respectively; SA40 has about a fifth the 

h TCa2 RNA found in F16932 and only slightly more than SGY269 and SC5314, yet it 

;S; produces similar quantities of pCal to F16932 and 10 to 20 times as much as the 

ul 20 other two strains; and ATCC 10261 produces a slightly larger amount of TCa2 RNA 
than SGY269 and SC5314, and a similar amount to SA40, yet it doesn't produce any 
detectable extrachromosomal copies of pCal. 

A simple explanation for the result with ATCC10261 is that the TCa2 elements in this 
25 strain have suffered mutations that corrupt their RT gene or render inactive other 
sequences required for reverse transcription, for example the polypurine tract. Such 
an occurrence would account for the lack of extrachromosomal pCal molecules in this 
strain. Accounting for the relative overproduction of TCa2 RNA in hOG759 and 
hOG1042, and the relative overproduction of pCal DNA in SA40 is, however, not so 
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simple. In hOG759 and hOG1042 there is roughly five times as much TCa2 RNA as 
would have been expected from a comparison with TCa2 copy number and RNA 
expression in other strains. This suggests that one or more TCa2 elements in these 
strains are being transcribed at a very high rate. There are a number of possible 
5 explanations for this. Firstly, it is possible that an element in these strains has 
suffered an alteration to its promoter region such that it becomes hyperactive and 
produces an abundance of transcripts, A comparison of the 5 1 regions of TCa2 
elements from various strains (Fig. 9) 7 however, failed to identify any significant 
differences between the LTRs of hOG759 and hOG1042 and the LTRs of other 

10 strains, although this does not rule out the possibility that such an element exists. 
Another possible factor that could be involved is the genomic location of the TCa2 
elements. It is possible, for instance, that TCa2 retrotransposons are normally 
integrated in regions of silent chromatin, as is the case with the Ty5 element of 
Saccharomyces. If, for some reason, a copy of the retrotransposon became integrated 

15 at an open or transcriptionally active region of the genome then this might result in 
the overexpression of its RNA. Strain variation in proteins involved in regulating 
transcription could also be involved in the overproduction of TCa2 RNA in hOG759 
and hOG1042. These strains have been subjected to mutagenesis with UV radiation 
and /V-methyl-A/-nitro-A/-nitrosoguanidine. It is possible that in the course of this 

20 mutagenesis these strains have, for instance, lost some repressor of TCa2 
transcription or suffered a mutation in some other transcription factor, with the result 
that the TCa2 retrotransposons are subsequently transcribed at a higher than normal 
rate. Finally, it is conceivable that the higher copy number in hOG759 and hOG1042 
acts to titrate out a repressor molecule, with the result that there are unrepressed 

25 elements which are then transcribed at a high rate. As can be seen, further 
experiments will be required to determine which, if any, of these factors are involved. 

Strain SA40 produces about 5 to 6 times as much pCal DNA as might have been 
predicted from a comparison of TCa2 RNA and pCal DNA levels in the other strains. 
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This suggests that reverse transcription of TCa2 RNA is proceeding more efficiently in 
this strain than in other strains. Again, there are a number of possible explanations. 
For instance, the retrotransposon in this strain could have a superior RT or the 
genomic RNA may be more efficiently packaged into the virus-like particle where 
5 reverse transcription occurs. Alternatively, it could result from some host factor, such 
as increased availability of the primer tRNA fragment, which may be limiting for 
reverse transcription. Whatever the cause, it is interesting that strain SA40 manages 
to produce abundant amounts of pCal DNA from, apparently, just one integrated copy 
of the element. This may make it a useful strain for further dissection of this system. 

10 

Determination of the number of integrated elements in the closely related strains 
hOG759 and hOG1042 revealed that hOG1042 has at least one more copy than 
hOG759. There are at least three possible explanations for this: (1) a recombination 
between the two LTRs of a retrotransposon in hOG759 resulting in the deletion of an 

15 element, (2) a non-homologous chromosomal recombination resulting in either the 
duplication of an element in hOG1042 or in the deletion of an element in hOG759, 
and (3) a transposition event in hOG1042 resulting in an additional copy in this strain. 
Intra-element recombination and non-homologous recombinations are both likely to be 
relatively rare events and so, given the abundance of full-length pCal molecules in 

20 hOG1042, and the fact that the elements encode a potential integrase enzyme, the 
most likely explanation of the extra copy in hOG1042 is that it is the result of a 
transposition event since the divergence of this strain from hOG759. Since the 
divergence of hOG1042 from hOG759, the strains have spent most of their time 
stored at -80°C, with no more than a week or two of active growth. The discovery of 

25 what is likely to be a transposition in hOG1042, in just a short period of time since its 
divergence from hOG759, suggests that the retrotransposon may be transposing at a 
high rate, which is perhaps not surprising given the abundance of apparently full- 
length reverse transcripts. If this element is still actively transposing then it may make 
a useful system for insertional mutagenesis in C. albicans, as has been the case with 
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Ty1 and Saccharomyces, Regarding this last point, it is of interest that hOG1042, not 
only has more integrated copies of TCa2 than hOG759, but also has suffered a de 
novo auxotrophic mutation (resulting in a requirement for aspartate or proline when 
brought to homozygosity) that is not found in hOG759. It is possible that this 
5 spontaneous mutation is the result of a TCa2 transposition event. 

The expression of TCa2 RNA was found to be 5 to 10 times higher at 37°C than at 
27°C. This contrasts with the expression of the C. albicans retrotransposon-like 
element TCa1, in which the RNA was found to be 20- to 30-fold more abundant at 

10 25°C than at 37°C. The temperature-dependent expression of these retrotransposons 
does not appear to be the result of a general temperature-dependent variation in 
transcription rate, so it is probably a specific retrotransposon effect. It is not clear 
what advantage it confers on the retrotransposons to regulate their expression in this 
manner. It has been suggested that TCa1 could play a role in, for instance, up- 

1 5 regulating genes which improve the chances of the survival outside of the host, or, 
alternatively, down-regulating genes which trigger host defences. Similar effects 
could be proposed for TCa2. For example, transposition of TCa2 could up-regulate 
genes required for maintaining an infection, or could down-regulate genes not required 
outside the host. It would be interesting to identify the sequences within TCa1 and 

20 TCa2 that are responsible for their temperature-dependent expression. Such 
sequences may be widely used in C. albicans as a means of regulating the expression 
of specific genes. The TCa1 and TCa2 promoters may also make useful temperature- 
inducible promoters in transformation studies analyzing other C. albicans genes. 

25 In our original description of pCal we estimated that it appears at 50-100 
extrachromosomal copies per cell in hOG1042 (30). In Figure 7, however, the TCa2 
probe can be seen to hybridize to the extrachromosomal and chromosomal DNA from 
hOG1042 (37°C) to a similar degree. The number of integrated copies in hOG1042 is 
10 to 12, suggesting that, at 37°C, pCal is also present at 10 to 12 copies per cell. 
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This estimate may be misleading, however, because at least some of the pCal 
molecules are likely to be located in the interior of a large, proteinaceous particle, and 
therefore may be lost during the DNA isolation procedure. In agreement with this, we 
have found that the amount of pCal obtained, relative to chromosomal DNA, varies 
5 with different DNA extraction protocols (not shown). The method used to isolate the 
DNA for the Southern shown in Figure 7 gives a lower amount of pCal than some 
other methods. An unbiased technique will be required to accurately determine the 
absolute number of extrachromosomal pCal molecules per cell. The technique that we 
have used in Figure 7 should, however, be a reliable indicator of the relative amounts 
1 0 of pCal in the different strains and at the different temperatures. 

An analysis of the 5 r regions of TCa2 retrotranposons from the various strains 
showed that some of these elements have minus-strand primer-binding sites which 
are very long. One clone from hOG759 has a perfect 32 bp match to the primer 

15 tRNA Ar 9< ucu > fragment. The other clone from hOG759 and the two clones from 
SC5314 also have 32-bp matches to the tRNA primer, allowing for 2 G-U base pairs. 
The p30 clone of pCal from hOG1042 also has a 32-bp match but with 3 G-U base 
pairs. All the other clones have 31 out of 32-bp matches to the tRNA primer with 4 
G-U base pairs. To the best of our knowledge, these 32-base PBSs are the longest 

20 described. Most retrotransposons have PBSs that are 10 to 12 nucleotides long, for 
example Ty1 (10 nucleotides). Retroviruses, for example, Moloney murine leukemia 
virus have 18 nucleotide PBSs. After the TCa2 PBS, the next longest PBS of which 
we are aware is 24 nucleotides long and is found in the magellan element of maize. It 
has been shown that introducing a mismatch into the Ty1 PBS reduces the Ty1 

25 transposition frequency at higher temperatures while increasing the length of the PBS 
results in an increase in the transposition frequency at higher temperatures. These 
differences in transposition frequency are most likely due to differences in the 
efficiency of the initiation of the reverse transcription process. This suggests that long 
PBSs are more efficient than short PBSs at high temperatures. The very long PBSs 
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found in TCa2 elements, may thus predispose these retrotransposons to high levels of 
reverse transcription at 37°C. On the other hand, it has recently been shown that 
there are regions, in addition to the PBS, where Ty1 binds to its tRNA primer, such 
that 30 bases of Ty1 RNA are paired with primer tRNA. Disruption of as few as two 
5 of these base pairs was found to have a drastic effect on transposition frequency. It 
may be that a long PBS is necessary for efficient reverse transcription at 37°C, 
especially for elements, such as TCa2, utilising a tRNA fragment that is just 40 
nucleotides long and to which there would be little opportunity for additional regions 
of base-pairing. 

10 

No hybridization of either the TCa2 internal or LTR probes was observed to DNA of C. 
maltosa, C. tropicalis, or C. parapsilosis which are all close relatives of C. albicans, 
nor to DNA of the more distantly related C. pseudotropica/is. This suggests that TCa2 
is specific to C, albicans. Given the apparent ubiquity of retrotransposons in the 
15 eukaryotes, it is likely that these species have retrotransposons, but that these 
retrotransposons have diverged sufficiently since speciation that they are no longer 
detectable by hybridization to TCa2. 

In most of the C, albicans strains that we have examined here, there is a fairly low 
number of integrated copies of TCa2 (5 or fewer per genome). The full-length TCa1 

20 element is also present at low copy numbers (just 1 or 2 per genome) and all the 
retroelement LTRs found in C. albicans to date, and those of TCa1 and TCa2, appear 
at a similar low copy number of about 5 to15 per genome. These low copy numbers 
are suggestive of a mechanism whereby transposition of retroelements in C. albicans 
is held in check. In hOG759 and hOG1042, however, the copy number of TCa2 is 

25 higher (about 10 full-length elements per genome) and appears to be capable of 
increase. It may be that in these strains the TCa2 retrotransposons have escaped the 
normal constraints on their replication and are thus transposing at rates much above 
normal. If, as is most likely, the majority of newly transposed copies are themselves 
capable of transposition they may serve to increase the rate of transposition still 
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further. It would therefore be interesting to see what would happen in these strains if 
they were continuously grown for an extended period. 
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CLAIMS 



1 . An isolated and purified retrotransposon having a copy number of between 
5 40-1 50 or 50-100 copies of free DNA of itself per genome. 

2. A retrotransposon according to claim 1 which is linear. 

3. A retrotransposon according to claim 2 which is double stranded. 

4. A retrotransposon according to claim 1 which is isolated from fungi or yeast, 
including Candida or Candida albicans. 

10 5. A retrotransposon comprising the genetic material encoding at least one 
polypeptide positioned between at least two long terminal repeats, and 
wherein the retrotransposon is capable of integrating into the DNA in a 
genome providing a copy number of between 40-150 or 50-100 copies per 
genome. 

15 6. A retrotransposon according to claim 5 which is isolated from fungi or yeast, 
or Candida albicans. 

7. A method of introducing DNA into the genome of a cell which method 
comprises introducing a transposable element comprising a nucleotide 
sequence encoding a desired protein located between two long terminal 

20 repeats sequences having the sequences illustrated in Figure 2B, which 

element is such that it can insert into the genome of said cell in the presence 
of an integration factor. 

8. A method according to claim 7 wherein said integration factor comprises an 
integrase which optionally is itself included in said transposable element and 

25 which integrase is derived from the POL region of said pCAL retrotransposon. 

9. A transposable element for introducing a desired DNA sequence into the 
genome of a cell, comprising an internal domain for receiving a nucleotide 
sequence encoding a desired protein flanked by two long terminal repeat 
regions having the sequences identified in Figure 2B. 
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10. A DNA transfer system comprising: 

a) a transposable element for introducing a desired DNA sequence into the 
genome of a cell, comprising an internal domain for receiving a 
nucleotide sequence encoding a desired protein flanked by two long 

5 terminal repeat regions having the sequences identified in Figure 2B, 

said transposable element being capable of integrating into the genome 
of a cell in the presence of an integration factor; and 

b) an integration factor 

11. A transposable element according to claim 9 comprising an open reading frame 
10 encoding an integration factor which is an integrase protein and which is 

optionally encoded by a nucleotide sequence within the POL region of the 
retrotransposon of Figure 2B. 

12. An isolated and purified retrotransposon comprising a nucleotide sequence 
selected from the group consisting of: 

1 5 (a) The sequence illustrated in Figure 2B; 

(b) A nucleotide sequence with at least 65% similarity with the LTR and 
POL region of Figure 2B; 

(c) A nucleotide sequence that hybridizes under conditions of standard 
stringency to the nucleotide sequence shown in Figure 2B; and 

20 (d) A functional fragment of (a), (b) or (c). 

13. The integrated form of the retrotransposon claimed in claim 12 comprising the 
integrated form being herein designated TCa2. 

14. An expression vector comprising the retrotransposon of claim 1, 5 or 12. 

15. A method of gene disruption or altered expression comprising integrating a 
25 retrotransposon of any one of claims 1, 5 or 12 into a site or sites in a yeast or 

fungus or Candida wherein the retrotransposon contains elements that cause 
gene disruption or altered expression at the site or sites; and, optionally the 
gene disruption or altered expression is non-revertible. 
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16. A gene discovery method comprising integrating a retrotransposon of any one 
of claims 1, 5 or 12 into a site or sites in a yeast or fungus or Candida wherein 
the retrotransposon contains elements that cause gene disruption or altered 
expression at the site or sites, and, optionally the gene disruption or altered 

5 expression is non-revertible; and, mapping the gene or genes disrupted or 

whose expression has been altered, by the retrotransposon. 

17. A retroviral-like carrier system comprising the retrotransposon of claim 1, 5 or 
12. 

18. A transformation and expression system for fungi or yeast or Candida 
10 comprising a retrotransposon of claim 1, 5 or 12. 

19. A nucleic acid fragment selected from the group consisting of: 

(a) a nucleic acid sequence positioned between at least two long terminal 
repeats of the sequence of pCal as described in GenBank accession 
number AF007776; 

15 (b) a nucleic acid sequence with at least 65% similarity with the LTR and 

POL region of the sequence of (a); 

(c) a nucleic acid sequence that hybridizes under conditions of standard 
stringency to the nucleotide sequence of (a); and 

(d) a functional fragment of (a), (b) or (c). 

20 20. A nucleic acid fragment according to claim 19 in which the nucleic acid 
sequence comprises a functional POL gene. 

21. A nucleic acid fragment according to claim 19 in which the nucleic acid 
sequence comprises two long terminal direct repeats flanking a series of genes 
in the order gag (group antigen), pol (polyprotein) where the pol sequence 

25 comprises an aspartic protease, an integrase and a reverse 

transcriptase/RNAseH, particularly as seen in Figure 2B. 

22. A functional optionally temperature sensitive inducible promoter isolated from a 
retrotransposon of claim 1, 5 or 12. 
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23. A retrotransposon selected from the group consisting of retrotransposons 1- 
28, whose sequences are given in accompanying figures 1 7-48, and 71 . 

24. A method of assigning a function to a nucleotide sequence which method 
comprise providing said sequence between the long terminal repeat sequences 

5 of the transposable element according to claim 1, 5 or 12 and introducing it 

into said cell and monitoring for the presence of an altered phenotype of said 
cell compared to a cell which has not had said nucleotide sequence introduced 
therein, 

25. A method for gene disruption or altered expression comprising disrupting a 
10 gene by active retrotransposition into a new site or sites in the Candida 

genome of a retrotransposon, wherein the gene disruption or altered 
expression is optionally non-revertible. 

26. A method for discovering a gene comprising disrupting a gene by active 
retrotransposition into new site or sites in the Candida genome of a 

15 retrotransposon, wherein the gene disruption is optionally non-revertible; and, 

mapping the gene disrupted. 

27. An immunological, or immunogenic, or vaccine or therapeutic composition 
comprising a carrier or diluent and the expression vector of claim 14 wherein 
the vector expresses an antigen, or an epitope of interest or a therapeutic. 

20 28. The composition of claim 27 comprising an immunological, immunogenic or 
vaccine composition, wherein the vector expresses an antigen or an epitope of 
interest. 

29. The composition of claim 27 comprising a therapeutic composition, wherein 

the vector expresses a therapeutic. 
25 30. A method for inducing an immunological response in a host including an animal 

or a human comprising administering to the host the composition of claim 27. 
31. A method for inducing a therapeutic response in a host including an animal or 

human comprising administering to the host the composition of claim 28. 
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A method for detecting the presence of Candida comprising detecting the 
presence in a sample of a retrotransposon as claimed in any one of claims 1 , 5 
or 12. 
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ABSTRACT 



TCa2 is a Tyl/cop/a retrotransposon from the pathogenic yeast Candida albicans, in 
contrast to other retrotransposons it can appear as an abundant, extrachromosomal 
double-stranded DNA molecule, called pCal. The invention relates to the isolation and 
characterisation of TCa2 and pCal together with its uses for inducing random 
mutagenesis in a genome, as a component of a transposable element and of an 
expression vector. 
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402 AOGATLEADCIGDLI IRVGIVSI TLSNTLYLPES 

17 01 wii^uvriAATCTTGTGAGTTTGAAACAAATTGAAGA^ 
436 SFNLVSLKQIEERGFNVLITXSSVIVFNQNVAP 

1301 TACTATT&TTGCTTCAAGGAAGAATGCTGC^ 
469 TI IASRXNAADLYMGPQFSSESLSCDFDYDGtiA 

1901 ^TATGTTCTCCAAXGCTAACCAAGATGACAAAGATAAATCAAGTA^ 
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2001 ATTCTTTGACGGAGGTTGATGTTTTAGATGTTGAAAT^ 
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1869 LSRAXFXAFVEGMIKRLDLEDNQTSIQNAITAE 

- . . PPT1 .... 

6101 TAAGTGGATCTATCATTACTATTATC^ 

STOP * 

TATA. 

6201 TTATGAGAATGGAAAATTTTTCCATCACACAT^ 

■ TASA . POLY A 

6301 TATCAACGAGATAGAAGGGAGGAGTTTCAATATATATCTTGTGA^ 

6401 AGGTAAAGAAAGTTTATATTCCaiCa 6426 
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S. cerevisiae 

1 

3 9 UCUG GGUAAUGCGGUGCGCUCG 

3 1 -CUAAUCUUCAG CGCAAC 

lillillllll Mill! 

RNA 5 ' . . . UAUUCCAUCA GAIJUAGAAGUC GCGUUGUAUAAAUUCAGUCCU . . . 

G A 
UGAUA UUAAA 
G C 
AUAAUCAUUUCGUCC 



Teal PPT: GAATC - AGGG- AG 

I I I I MM II 
pCal 3* PPT: AATC - AGGGGAG 

III IMIIII 
pCal internal PPT: ATCCAGGGGAG 
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Protease 




Integrase 


(zinc finger) 


1731 (268) 


TQWCLDSGATSHMC 


1731 (397) 


HKRNGH 


-28- 


CKTC 


copia (287) 


CGrVLDSGASDHLI 


ta (419) 


HERFGH 


-30- 


CEPC 


Tntl (292) 


SEWWDTAASKKAT 


Tntl (426) 


HKHHGH 


-25- 


CDYC 


Tyl (456) 


GHLLLDSGASRTL I 


Tyl (599) 


HRMLAH 


-32- 


CPDC 


Ty4 (410) 


KLVIIDTGSGVNIT 


Ty4 (562) 


KKRMGH 


-29- 


CQTC 


pCal (370) 


KYLWDTGATISW 


pCal (568) 


HLMSNH 
* ^* 


-29- 


CKVC 
★ ★ 



Integrase (continued) 

1731 (518) KIKC IRSDNGGEFVNNVFDDYLKAHG IARQLT I PHTPQQNGVAERANRTLVEM 

copia (543) KVVYLYIDNGREYLSNEMRQFCV^ 

Tntl (543) KLKKLRSDNGGEYTSREFEEYC S SHGIRHHIKTVPGTPQHNGVAERMNRTIVEK 

Tyl (729) SVLVXQMDRGSEYTORTK-IKFI^^ 

Ty4 (689) KVREINSDRGTEFTNDQIEEYF I SKGIHHILTSTQDHAANGRAERYIRT I ITD 

pCa] (687) KVAYFRSDNAPEFPQPSDtxAEF. . . GIWRETIAAYSPEX^GIAEVVNKLILQQ 



Reverse Transcriptase 

1731 (880) HHMDVCTAYLNSEL « . KDTVYMKQPQGFTDAANPDQVLT t T rRKAIYGLKQSGREWN -32- ILVYVDDLIL 

copia (999) HQMDVKTAFL^GTL . - KEEIYHRLPQGISCNS . . DWCKLNKAI YGLKQAARCWF -34- VLLYVDDWI 

Tntl (919) SQLDVKTAFLHGDL. . EEEIYMEQPEGFEVAGKKH^CKLKKSLYGLKQAPRQW -33- LLLYVDDMLI 

Tyl (1343) TQLDISSAYLYADI . . KEELYIRPPPHLGM . . . NDKLIRLKKSLYGLKQSGAKWY -29- ICLFVDDMVL 

Ty4 (1381) KTLDINHAFLYAKL . .EHEIYIPKPHD RRCWKLNKAL YGLKQ S PKEWN -3 0- IAVYVDDCV1 

pCal (1461) QHIJDVESAYLNASITKSOTIYVFPPKSVPL. . KKWHCWIXKRSVYGLKQSGLEWY -33- LGLYVDDILM 



RNase H 

1731 (1129) AFTGFVDADWGGDRLDRKSYTGYV 

copia ( 1 247) KIIGYVDSDWAGSEIDRKSTTGYL 

Tnt 1 ( 1 1 74) ILKGYTDADMAGD IDNRKSSTGYL 

Tyl (1604) KLVAISDASY . GNQPYYKSQIGNI 

Ty4 (1 639) KVTAITDASV . GSEYDAQSRIGVT 

pCal ( 1 734) VIECFSDASFAPG . LDRKSISGTL 
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CfT-I: Cladosporiumfulvum 
Tfl: Schizosaccharomyces pombe 
gypsy- Drosophila melanogaster 



99 



17.6: Drosophila melanogaster 

Tom: Drosophila ananassae 

- Ty3: Saccharomyces cerevisiae 
HIV1: Human immunodeficiency virus 
RS V: Rous sarcoma virus 



72 



MMLV: Moloney murine leukemia virus 
— Osser: Volvox carteri 

Tal: Arabidopsis thaliana 

Tnt 1 : Nicotiana tabacum 
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Tstl : Solarium tuberosum 

Hopscotch: Zea mays 

Ty5: Saccharomyces paradoxus 

173 1 : Drosophila melanogaster 

— copia: Drosophila melanogaster 
pCal: Candida albicans 

Tyl: Saccharomyces cerevisiae 

Ty2: Saccharomyces cerevisiae 
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Ty4: Saccharomyces cerevisiae 



non-LTR retrotransposons 
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SGY-l TGTTGGTTTGTGCACTATTTTGTGTCAGAAACTGATCAAT^ 100 

SGY-2 

SC5-I 

SC5-2 G 

ATC-l 

ATC-2 

SA4-1 

SA4-2 

F16-I 

F16-2 

759-1 

759-2 

p30 

p36 

SGY-l ACTAAACTATATTGTGTAGTATAA^ 200 

SGY-2 

SC5-I G 

SC5-2 : G 

ATC-i v* 

ATC-2 ™ 

SA4-1 

SA4-2 

F16-I 

F16-2 

759-1 G 

.759-2 G 

p30 

p36 G 

SGY-l AATAAL'rr CG TT^rAflTTCACTATACACA^ 300 

SGY-2 

SC5-1 a A A * C - " 

SC5-2 G A.A..C... 

ATC-l 

ATC-2 

SA4-1 G 

SA4-2 G 

F16-1 

F16-2 

759-1 A ' A "*£"" 

759-2 A-A..C... 

p30 A AA 

p36 

SGY-l AATCATTTCGTC CCAAATTAGCGTTGT 392 

SGY-2 

SC5-1 

SC5-2 

ATC-i 

ATC-2 

SA4-I 

SA4-2 C 

F16-1 

F16-2 

759-1 A 

759-2 ..C C A 

p30 

P36 A 
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+ (-)PBS ^ * 

I ► C 

iGAUOAGAAGOCAAAAGCGAaAACCAUUUCGCC 0 
I I I I I I I I I I I I ! I i I I I ! M I M I I I i I ! ! I 
CUAAUCUUCAGUUU0CGCUAUOGGUAAAGCGG 0 

G 

C. albicans tRNA Ar s ucu fragment 
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Mil I I I I ! I II ME I ! : I : I I I : I I I : n 
AACA c 4JUUAAG A CA l jGGA A GUUUGAA UGAUAG u u 

A CU 

G AGOOCCGC . . . 
gag ORF 



stem-loop 



a 
w 

FIGURE 1 1 A 

in 
m 
m 

O 
fU 

U:l A. 

,n P CC A E 

~ ' " ' L - 

LXR internal 



FIGURE 11B 




1 2 3 4 5 6 



FIGURE 13 



EcoRl 0.40 
,Sacl 0.40 
TQ 0.80 
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Hindlll 
Sphl 



Sphi 

Pstl 

Sail 

Hindi! 

Xbal 

BamHt 



TAA 7.47 
(Nsil/Pstl) 7.44 
TAA 7.43 




stl 1.30 
indll 1.57 
GA 1.77 



Asp718 3.79 
Sail 3.98 



EcoRl 6.75 
(Nsil/Pstl) 6.48 

ATG 6.17 

Cell! 6.13 



Styl 5.06 



EcoRl 5.86 
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>retrotransposon_l 1309bp LTR kappa: 698-977 

CT GGATAAAGAAAT C AGAAAAGAGAT AGCAGG AAAACC AGGAAAAGGTGACGAT GAT GAC GAC G AC AG T 
TGGGGATCTGT 

GCCTGTTTCAATTCGAGTATTTGCTGAAGTTGAAAAGAAGTTGAAGCAAAAGAAAAGTTTGGCATCAAG 
CT AGAT AT T TA 

TATATGTATATGATTAGACCAACATAAAACTAGACGTCCAAATATTTATTTATTTATTTATTGATATAT 
ATTCTTATTTA 

T T ACT GT TAT G ATCT T T T GAT TC AC AC AGAGAT T T AAT C C AAAT C AAT ACC T TTTGTTTT GT AG AAAT C 
TTTTGCTTCTT 

CAATTTGTATTTTCAATTCTTTGTATTTATGTTCTTTGTCTTTGAATGTAACAATTCCCCAACCTAACG 
T T GAT AAGGCA 

T AAGACCCAAATGT GAC T AAT C CC CACCAT GGC AAGT AT GGCAAT AT T T CAT C GT GT AT T T TAGC T GG A 
GTTGGAATCAC 

ACCTGTGATAAGAGCAAAATAAATAGCTGATAAGGCAAAAATTGTTAATCCTGTTTCAGTAGCTTTAGT 
CATTCTTATAG 

TTAGACTTGTTAAAGGGTAGTTGTGTTAATTGAAGATATGCTGGAAAACTATACTTTTCGTTGTTTTTT 
TTTTTCAATCT 

AGGTCGGGTGTGCTGTTATTTTTTTTCTCTCTTCTTGGTTCTTAGTATTGGATTATATGTTGGTTTATG 
CGACGTTTGTG 

TCAGGGAAATAACACCTTGATATAAGTCGTGCGTATTAGGTCAACATTGGTGAAAAATTTGCACTCATC 
GAGAGCCAGGA 

AT T AGT AT AAAAAGAAGAGAAAAGAAAGAT AT T T AGGAT ATTT AT TAT AT AGG GACCGAGT T T CAGGAG 
ACACTTTTAGT 

GGGCGTAAACT T CAT TCACT C T GT TTTTTGCT TATT AC AAAT TAT CACC T AT C GT GT AC T AGGAC TAAT 
T CT CACGAATA 

T T CCGTGTAT ACAAACAC TT AT T GC C AAC T T ATG G TGCGGAACT T TAT T T GT C T GAACCAAAAT CAAAG 
TCACATCATTT 

AAAT GAAC G T T GACAT AAAT AGAT TCTTTATT CAATAGAAACAAT TTCTTCCT T TNT CTTTTCTT TG T A 
TTANTGGTTAG 

ATT TCCAT T CCAT AT ACACACAAGATG T CAAC GAAAT C AGCAAAT TCAAC T GC T GTCAAT TCAT T TAAT 
GCAAACCACTC 

CAACTATGACGTTTTTAGACCTTCATTCACCCCAGTTTTGGTCAATACATTCTTAGTACATCTTGGATT 
AGC T ACGAAAA 

ACCCAGATGACACTTTCACTTTTGACATA 



FIGURE 18 

>retrotransposon_2 1340bp LTR kappa: 770-104 7 

C CCNTT T G TNT G G TAC AT G TTAG ACAG GCC CAAAAAAT GGT AT C ATT T AGAAC TG T AT GGAGAACAT T A 
GTTTTGGTCCA 

ACATTGCGTGATGATGGTATNTNTTTCGTATTATAGTACAATGATGGCTCAATGATTNATTTTAGGTTT 
ATATGTGGATG 

AT ATCT TAAT GGAC AGAATCTCAGATGGAATC GT TAT C AGAT T TGT T GAACAAGAGAGAGTT T AT TT C G 
CGTNAAAATCA 

ATTTAGGTCTCATGACAGAATATGTGAGATAAAATGTCCACGTAAGCAAAACTGGGTGATACTNTGAAT 
TAAGAGATACT \ 

CCTAAATAAGCAAACCAAGGATNTTAAACTACACAANTCGTATGGTAAAACGTGCTTTGAGTNCCAAAT 
GATAGATGCGA 

GAT ACC AACAAAAT AGNAC T GT C GCAAATGCT GAANACAAT T T C ACTGAGG T T CGAAATGNAAAATN AC 
TTAANTCAATT 

AAAAAATTTATACCAAAAGGTGGTCTGGAAGTGCTGATATGAACACGAAATTTAANGCATTCTGTGGAA 
AATTCGTTTAA 

GCTCACANTCGGAAAATACTACCATTCTACATTTGCAGAAAATTAAAATTGTGTTGTGAAATATCTACA 
T CC T ACAAAGT 

TCAAGACATTTATTGATGGTATATTCAAAGGACTCGATGTTGAGAATGATAATAACCTGAACCAAGACG 
CTACAAATGCT 

AATTGAGTAATTCGTAATTGCTAAACAACGCCATTTCGAATCAGGGGAGTGTTGGTTTATGCGACGTTT 
GTGTCAGGGAA 

AT AACAC CT T GAT AT AAG TCGTG CG T AT TAGGT CAACATT GG T GAAAAATT T GCACTC AT C G AGAGC C A 
GGAATTAGTAT 

AAAAAGAAGAGAAAAGAAAGAT AT T T AGGAT AT T TAT TATATAGG GACCG AGTT T CAGGAG AC AC T T T T 



AGTGGGCGTAA 

ACTNCATTACTNTGTTTTTTGCTTATTGCAAATAATCCCTATCGTGTACTAGGACTAATTCTCACGAAT 
ATTCCGTGTAT 

ACAAACAAAAT C AGACT TC T TGGT AAGCCC AG C CGAAACAGCC AT ACT T CT AGT GGAT CT TT CT AT ACT 
AC AAC AT T C AC 

AC T GC T T GAC C T ACAAC TACAC ATAT T CCT TGT T AT AAG GG CAAT C TAT CACACAA&AGATT T AC T GT T 
GACT C AC AAGA 

TAT CAAC T G TAG T AAT AAAGGAG T GC AT T C TAT GACCT T TG GAGAGG AACTAT G TAT AAT AT AAG AG AG 
AAGGGACTAAA 

GAT C T AT AT AT AATGAG CAGGAT GGGT AAC CCGG T GGG GT AT T AGC AC GC AC AC GAC CT G 



FIGURE 19 

>retrotransposon_3 556bp LTR kappa: 1-216 

CAACATTGGGTGAAAAAATTTGCACTCATCGAGAGCCAGGAATTAGTATAAAAAGAGGAGAAAGAAGGT 
AT T TAGGATAT 

T TAT TAT AT AG GGACCGAG T T T CAGGAGACACTT T TAGT GGGCGT AAAC T T CATT C ACT CTGT T T T T T G 
CT T AT TAC AAA 

T TAT C ACCT AT CGT GT ACTAGGAC T AAT TCTCACGAATAT TCC G T GT AT AC AAACAT T AT ACGT G T CT G 
TAACTACGCGA 

AACT AC T T CG T CT CAG TTTTTTGT T AC AAACAAC T T T C CGT ATAGACCT GAG AT T T T G T C AGCT T GAT T 
GAATGGAAGAG 

TTTACTAAAGTACCAGAAAGGTGTTTTATAGATAACATGTAGATATATAAAAATGTTATATTACAAATG 
ACTTCCAAAAG 

AAAC TGT ACGAAT TTTGCTGTT TAT TAAAAACC AGT T CCT GAAAAC TAGT AT C T TAGC T T CAG T ACAT T 
TAGCCCACCTA 

AAT T GGACC TAT GACAAGT T C T ACT T T CCCGACAAT GC T AAT AT AGAGC AGT T TCTTCTTCTTCTTCTT 
CCTCGTC 



FIGURE 20 

>retrotransposonJ 2112bp Teal-like LTR: 221-608 

AT T TAATAT G T TGGT AT T GGC TAC TGCC AACT T C T TAGC T G AT GCAG AT GCCAT TGT T AAT AT TGT TAA 
ATTGGGTAAAT 

AGTATGAAGGAAGCTTTGGCAGGCGTTGTTATTTTTTTCACCAATTATTATCATCACCTGCGGAGGTTA 
GT C AAT T T GAG 

ATTGTGCGAGGGAAAAAAAACGACCTCCATACACTACCTCAAGTATAAGTCCAGTCCAATTGTTCGCTA 
TAG AG AG AT T T 

CCTAGCCGGAATGCACGACAATCCTGAGACGGAAGTCGATCGTCGATGCCCATGGTGCGTGGTGAAAAA 
TTTTCTTAGAA 

AATTTGTTCTTTCCTTCAACTGCTTTGAAGAGAGGGAGGTTCAAGTGGTTTAAGTACGACGGTCACAAA 
GATTGCGGCTT 

AT GAGGC CC GAAC T GAGT T GAAAT ACAAAATCAAGAT ATAAT T AT AT AC C T TAC TT G T CT ATAT T G T TT 
TAT AAT AC AT T 

CT T CAGAT AT T TAAAT T T CT GT GTAT CATCCTAT AAAACAGAG AT AC AT T CAG T GCAT T TAG TAT AC T G 
AGTGAACTGGT 

ACCTGTGACATTCAAGATAACTGTTTCACGCACGCTGGCAGACGAACACCAATAGTATGATGAAGAACT 
GACCATGGTGT 

AAGAGGTTTGATGGAGTTTCTTTTTTTTAGAAGAGGTTGATAAGCCAACAGATGAGGAGTAACAAGTAA 
CTCGCAACATT 

GTATAACATAAGTTTACATCAAATCAGAATTTACTAAGAAAATCAATCCATTCAAAAGGCACTCAATCA 
TTGAAAAAACG 

AGCTTAATGAGTAGACGGTCTGTTCATATGAAACAATTGAAAGGGTTGAATATTGTTTGGAAAATTATA 
TAAT TCAT GT C 

AAACTGG GAG GCT TAAAT T AT GG TCACT C C ACAGAT TAT GAAACGT AG T T ACACAAT T C T TGGACC T G G 
AAATCCCACAA 

GAGAGCGTTAGTTAGTTTGCACTCTCCTCACCAGTTAAACTACCCATGATTCTCCAATGTGGCTTATTT 
AAGTATCAGAC 

AAC AG AT ACAT G GT T T CC AAG TGGTCTCAT T T T T GGT T T ACT GGAGT CT G CAT T CC C C AC AAAAGT AC C 
TTTCAAAACTA 

ATTAATGTAGCTTCTATTTGATAGCCTCTGTTATGGAAATAGATTTGCTCTGCCCAGTGGGTGTAATTA 
TTCCCAGCTGG 



AACTATTCCGATAGATATGTTTTAATGTCAATTTAAATCTTGTAATAATAGTAAGGATGCGGTTTATCC 
GCGATCTTCTT 

AATACCTGTGGAGTTACTCCAGAACAGAGGTTCAATTTTTTCTTGGTTGGTAAATTATCCGAGTAACAC 
GGGGTAGCTTG 

G T T ACT CCAG TTGAGAATG T AAAC TAT AG AT GAAG AT T T CAACAC GCAAT TAT T AC CC CACCT T G GCG A 
ATTACTAATCG 

AC TAT T T G T TAAT CCAG AAAAAAT T AT AC ACAAACAC T G C CT T T T TT T AAAAAAAG CG T TAT T T T GAT G 
GAACGATAATT 

AACGATGGTTCTGCACAAAAATGTGGTCCAAAGCCCCAGACTATTCTGAAGTATGATTTGTT ACT TAAT 
TTAGTGAATAA 

TTAAACATAAAATCTGGAGAAAAATTTTTTTTTTGCTCTCATGACCAGTGGCAAATTCTTGGTAACGAG 
GCTTAACATTA 

ATCCGCAAATTACCTGGCAACAGAGAAAACACCCAGAAAGTTCTGTCGTATGAGAAAACCTACAGTTGT 
TTCCGATTTCT 

CCGAGCACTAAACATAAAGAGACCAGTAATGCTAAAAAAATTTTTATTTCTGCATTACTGTTTTTAGCA 
AATACACGTCT 

AATTTATTGTATTTGTTAAACATTCTTTTCCTGAAATTTTAAGAAAATGTTTTGGTTTGTTGGAATTCC 
ATTTAAACGGT 

ACTTTGGGGTGCAGACAGCAATCCATTTGGAGAGTGGCAAGTCTACACGAATTTAGCTAAGGTTCACTA 
TATCGTGTAAC 

AAGAAAT T T C T ATACCAAAT AAACAG CACTT GATT GAACTACAATATG T AAAAAC T T GC TTT T AT T AC C 
AGTCTTCATAC 

ATACCCCGGTCTTCTCTTTTCAATATTCTGTA 
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>retrotransposon_5 3742bp Teal-like LTR: 2443-2830 

TT AGAAAACAGGAAACAGCAAT AGAGAGCAATAATTGAAAAATAGTGT TGTCAACAAT AGAACAAATT G 
GTCAAACTTTA 

AAT GC AAAAC ATGAAAT T CC CAAT T T CC AGAAT AAATAATAT CAGCAT ACAT GGCCC CGAAAACT ACT T 
TACCGTGTCGC 

T T T AAC CCCCCCCT TCC TAAAACGAGACAAT TAGAC ATACATTCC AC AAT TAT C AT AAT CCC C TT T TTT 
TTCCTTACAAA 

ACACTTTATTTTTGTCGTTTTCGTTATTTGCTTCGACGACATTGTAAACTCTTTGGATTTGCAGTAGTA 
GTGCTCCTGGT 

GTAAGGTGGGTTTGGTTGTAGAGTAAAAGAAACGACAATTGATTACACCTCGATATGCATACGCATGGC 
AAAGAGAATAC 

CGAGTTAATAGTGAGTCTATTAGTGTTGCAGGAAAAGTTATACGAACAACATTTTGTTTAGTGTGGATA 
TTCCAGATCAA 

CAACAATATGACTAAAATCATAGCTCTAATTTTCAGTTTACCTTTGTTTATTACGATACTGCCACAGTC 
GTGCTGTACCA 

GGGTCAGTTTTAGAAAAACTATTCTAGAAATGATGAGTAGAAATGT ACTAT TATGAGCAAT AT TTCAAA 
AAGTGAAATTA 

TAATTGCTGCTGACAACACCAACAATACATACAAATTTGGAAACGAGCAAATCGAGAAAATTTCAATCC 
GTTTAGCAAGT 

T G T T CG TTG TC GT CAT T GT C GATTAG TT T CAGT T T CT AG AG GT GAAAT T TT C TAT G GC AC CAAAACC AA 
AGCCTCAATTT 

TAAT T T AC TC T GT GT GG T AC AAAAT AC AT T AGAGAGGAT CC T C TCC AAAC AG GAT T G CAG GAAGT T TTA 
CACGAGAATGA 

T T T AC T ACACGAC GT TGAAT T AAAAAGCT C AACC AG TT T G T C AGCAAT T T T G T TCT AT CT GT T CAATT T 
CTTGTATAAAA 

T AAAGCAAT AT GAGAGAGCATCT AAATCAAT AAT G T CAACAC AAT ATT AAAC T TTGAGAAGGAT T GT T C 
AACAAAACAAT 

CCGATGAATAGAAGAAGAATAATATCAAATTGTTCCTGATTGATTGTTGTTATTTATTTTTTATCTCCG 
AATTCCTGCAC 

AATGGCTCAACAACAGCCAACACGGATCACACATTAAATTTTTTTTTCGTGCAGGACCCCGTGGTGGTG 
GCTGTGGCTGT 

GATTGTGATCATTGTAGTTTCTGCCTTGATGATGACAAAAAATGATAGAGTTCAGTATGAGGAAGAAAT 
TAAGCGATATC 

GGTTTATGATGTGTTTAGTTATTAATTGCTCTCAATGGTTTTCAACAACGTATACAAAACTGGTGGTGC 
TTGAAACGAAT 

GAGTAATACAGATCTAATTAAGCTGTGATTTTCTAAGTTTGCCTTGTCTCTACAGTTCAAAAAAAAAGA 
AC AG AAC AC C T 

C AG AG GCT G T T G T GAT GCAAT TT T TAGGAACCT C AAC AAC AACC ACTGAC T GAT CT AAGCCAGC AT CT G 



TTTAATGGGTT 

TTCAAAAAGAATGGGGCAAACGGGGAATTGAACCCCGGGCCTCCTCGAATTTTGTGTTTGGTGAACAAC 
CCAAACGAGGA 

ATCATACCACTAGACCATTCGCCCAATTCGATGACTTGGAATTATTCTAGTTATTTTTGACATACAAAG 
CTCAGCTTTAT 

TACAGAT AG TCAT G T T TGCATGGATGAAT TAG TACT AC T AAT AAT AT AAGAAAAC T AG T T AAT T GG AG T 
CAATGTCTTAT 

ACATGTCT TC TGATGGGT TATGCATT GAT TAAT TAT GAAT TT CT TT T AAATAC AAT CT AT T GCT AT T AT 
TTGTATGTAAA 

ACTTTACCCAAAAACCAACAAAAAAGAGTGGTCTTGGATAAAGATTAAAGTAATTCCAAAAAGATTTGG 
TAAT T AGCTAT 

ATTGTTTTGACGTACATCTATAACTACAAATAGCCATTCAGTTTGATTATGTATATTGACATAGTTGGA 
TTTGTAATTTC 

TGT T AAAATGGAAAAC C C TAAT C AAAT G TAT ATGTTGAATAGGTAGTTAAAT TG TACAACC TAC TAG T T 
GTTGTCAATTG 

AATTCAGAGCCAATACTTATATCTCCTGGAAACTGATACACAAACGAATTGTTAAACTATAACACTCGA 
CGTTCACATCT 

AAGGATTCAT CGTCG T T AAGATTT AT ACT CAT T AGCAAACTC AC TTGCC AT AT T AAAC ACT T C T CAAT C 
TATTTCCCACA 

AT CCAATT AATC AGC ACGAAAACT AAGAT ACT AT AT AT AT CT GC CT AT ACCT GATAT AC ACAT GGC AC A 
TGGCGTATCCC 

ACAAAAAACCGTCAAGACAACACCAATATGACAATGCCAATTATACAATTGCATATACCACGTGACTTC 
ATTTTATGGTC 

ATGAGAAATTAACTTATCATGGGGTTAGGCGAGAATATCAACTGTTCGCTATAGAGAGATTTCCTAGCC 
GGAATGCACGA 

CAATCCTGAGACGGAAGTCGATCGACGATGCCCATGGTGCGTGGTGAAAAATTTTCTTAGAAAATTTGT 
TCTTTCCTTCA 

ACT GCTTT GAAGAAAGGGAGGT TC AAGT GGT T T AAGT AC GACGG TCAC AAAGAT T GCGG C T TAT GAGGC 
CCGAACTGAGT 

TGAAAT AC AAAATCAAGAT AT AAT TAT AT ACCT TAC T T GT C TAT AT T GTT T TAT AAT AC AT T C TT CAGA 
TATTTAAATTT 

CT GT GT AT CATT CTAT AAAACAGAGAT ACAT TCAGTACAT T TAGT AT AC TGAGTGAACT GG T ACCT GT G 
ACATTCAAGAT 

AACTGTTTCGCGCACGCTGGCAGACGAACATCAACACTGATCATTTGTTTTTTTTTTATTTCTCCTTTT 
TCTCCTTTTTC 

TTTCTTTTTTCTTCTTTCTTCAGACGTTGTTGATTTATTTTATCGACAGCATCCTTTTCTTTGGCCACA 
TATCCAAGCGA 

TATAC TGGCCAAAGCGAAGTCCT T T TATAAAGCAAT G CTACCAAAT GT AAC AG T T CGAGGT CAGAAGAT 
TAAGCGGGTAT 

GTTCACACGGATAT TT TATGGGG TATCAC T T GTAC CAA&CACT T T GAT AC GAT AAGAAT AT TT GTAAT A 
CTAACT TCAGT 

GTCTTTCATAATCAGCTCATAACCTGTTGGAATTTAAATTCGTATGTTGTTCATTCAAAATTTTGATAA 
ATGGGACGAGA 

AAT C ATCGT T GCCTCCTAATTAGATTATGAC TTAG TAC TAAC TAAAC TGT T TAT CAT T T TT TAAAGC GT 
TGGGCTCCATG 

TTAGAATAGATTATTAGGGCGGTACGTATTTCATAATTTATATATAGGTACTTATTTTTACTAATTTAT 
TGCACAGGAAA 

AGATAAAAGGTATCGATTATACCTATCAGCAAGGTTTAAGCAAAATGAAGTATTTTTACCATATTTTTC 
CAT T T T TAT AT 

AGAT AC AT CAAGAGGTTTAT T T TAAGT TCACCT GGAT AAAC CAT T C AACT AAC CCAATT GAATTGAATG 
ACAATTT G AT C 

T CCAAAGAGG GAT T CAT TT CTAT T C TG GAGAGAT AAACGT C AT TGT T T AGGAAAGAGCAAG AGAT AAGA 
AATCTTTTGTA 

TATT GTATATATATTATT AATGTTATATTACACTATTGTTT GTTTGTTTGT TATAATTAT ATGT GAGAT 
T TCATATG TAA 

GATGTTGTTATCTCTTTCCATTATTTAGCTTTTTTGAAAAAGCTATCAATGGCTCCACGTTT 



FIGURES 22 

>retrotransposon_6 1438bp Teal-like LTR: 91-479 

GTGTAGATGCAATAGGTGTATGAAATGTATCTAGATTATATCATGAAGCCCTTGCCAATAAAATCTAGC 
CAAAAATTTGT 

GTACTGCAATTGTTCGCTATAGAGAGATATCCTAGCCGGAATGCACGACAATCCTGAGACGGAAGTCGA 
TCGTCGATGCC 



CATGGTGCGTGGTGAAAAATTNTCTTAGAAAATTTGTTCTTTCCTTCAACTGCTTTTAAGAGAAGGGAG 
GTTCAAGTGGT 

TTAAGTACGACGGTCACAAAGATTGCGGCTTATGAGGCCCGAACTGAGTTGAAATACAAAATCAAGATA 
TAAT TAT AT AC 

CTTACTTGTCTATATTGTTTTATAATACATTCTTCAGATATTTAAATTTCTGTGTATCATCCTATAAAA 
CAGAGATACAT 

TCAGTACATTTAGTATACTGAGTGAACTGGTACCTGTGACATTCAAGATAACTGTTTCGCGCACGCTGG 
CAGACGAACAG 

CAATTCTGTAATTGTCGTAGAGTAGCAACAAATCTTCCCGATGATTGGTACTTGTGTTAGTCTACACGA 
CATGTGTTTTG 

GTACACTTGAACTGTATGTCCAAGAATGGAAACATATGCGGGAAGGACGCGAAAGATGAGTTTGGTATA 
GAAGGGATAAG 

AACT G TAAAATAT AT TAT G T AGTT ATAT AT TTT AAT T ATG GGAAATT GAG T GT T TAT TCT GT T CAACAA 
GTTTCAACCGT 

AGAGATTACATTTAAAGTCTGTGGTCGAAATCCACAAGATACAGCAAATTCATGAATTCACCTATTTAA 
AT CAAGT T T AC 

CAAGCACCATTGCCTAGAACTTGCCATATCATCAATTAAGTCAGACATTACTAATTTGAGCAAAGCTTT 
TAGCTTAATGG 

GCCAACTAATTTAAGTCGAATTGGTAATGCAATCTGTTCTTCATTTGAGTCGCTTGCTACGGCTCCATG 
AC AC ATCC AT T 

T GAT T GT T T TAAT T CGAGCAAT TATCC ACCATAACTC T C AGT AAT AT CAT T AACAGT T TTAC GC T TAAT 
AAG CAT AGAAA 

GT T GTATGAAGTT GTCT C CT AGG T AT GCTAGAGAGAT T TGT AT ATAC GAC C AG TAAAGAG T GT GAT GAG 
GTGTTTACTGT 

AGGG TAAAT T GCAAT TGACT TGAGT TGATAGCGGT TAT TACAAAAGTAT AGAT T C AACAAAT T AAGACA 
AGTACCAAACG 

ATAGGCCGAATGTGACTTATACCGTTGAAGTTCAAGCGTTTTTAACAAATAGAAATGTGAGATTAATGA 
G T T CGACAAAT 

GTTTTACTAGATACTATTAATTTCGATGTACTATATAAGTTTAACCAGCTATAACCGGCAGAGCAGACT 
TCCTGAAACTC 

AAATTGGTTGTGTTTGGACTTGAGTTACACCACAAAGTTTGACAATCGTGAGGACATAGCAACCTATCA 
AGCCACTCA 



FIGURE 23 

>retrotransposon_7 1304bp Teal-like LTR: 749-1133 

T GAAGAT C T GGC TT TGGCC AAAG TAT CAGCTGCAT TAGAT AC T GT CAT T GGCAT T GGC TT GAAC C CAC T 
GGCTGTGGATG 

TAACTGTGGAGCCAAAAGCTCGTAAAGCTTTGGCGTTCATGGAGAAAAATCTTTTAACAGACATTGTAT 
AAACGTTGAAG 

AT T AAAGAAAAAAAAAAC AGAAAGAT TACG AAT AAT T T GT T TT T AAT T G G T GGGT ATGAGGT GT T GC G C 
AGTCGACTCAA 

CAATTCTCTTTTGGTGCACAAAGTTGGTTTTATGGTCAACAATTACGGAGTACTGTCTGTAGTGATGTT 
GAATCTAAGAC 

GGAAATGCCTCCTTTACATTTGTTTCTATTCTCTTAAAATACATATTCAATTGTGTGTTTTAATTGAAA 
ATTTGTTCATC 

TTCATCTGATGATTGTGTAATCTTTGCGGGGGGGGGGCGTGTCATGAACCAATCTCTTTGAGTCATAGG 
ACGAGTCATCC 

TATTGTGACTCATGGCTCATCTTACTCTCTTACTAATCTCTTACTTCATCTGTTTACTATAAATATGTC 
TACTACTCCTC 

TATTTTATTACCTCGTTTACTATTTTTATTCAATATATGATCTTATCTTTAAATTTCTTTTGACAAATA 
CAATCAACTTA 

CAAAAC AAAAG AAAAAAGACT AAT AAAAT AGAAT TAAT GAAAAAAAAAAAAGACT AAT AAAAGAAAAAG 
AAAG AAG AC T A 

AC AAAAGAAAAAACAAACC GG AGAACC CTTCGCT G T AGAGGAAT T TCC T AG CC GGAT T GC AC GAC AAT C 
CTGAGACGGAA 

TTCGATCGTTGATGACCGTGGTGCGAGGTGAAAAGTTTTCGTAGAAATTTTGTTCTCTCTTTCAAACTG 
CTTTTAAGAAA 

ATGAGGTTCAAGTGGTTTAAGTACGACGGTCACAAAGATTGCGACTTATGAGGACCGAACTAAGTTGAA 
AT AC AAAAT C A 

AGATAT AAT TATATACCTT ACT TGTCTATATTGTTTTATAATACATTCTTCAGATATT T AAATT CCTGT 
GTATCATCCTA 

TAAAACAGACATACATTCAGTACATTTAATATACTGAGTGAGCTTGTATCTGTGACATTCAAGATATGT 
TTCGCGCACGC 



TGACAGACAAACATTTGGTTGTAAAAAAAAAAATATTGAAGAACCTCATCACCAAGATGTTTGAAAAAA 
AAAAAAATCAA 

ATACT TAAT CG C AAGCT T T T C AAT TTATTGATT GT T T GAAT TAAT TGAAT AT AAAC AAAAAAAAAAAGA 
ATTCAAATTCA 

TTTGACATGTCAGTGGAAGTTAGA 



FIGURE 24 



>ret rot r anspos on_8 3604bp POL protein: 591-3575 

AGCCCCAAAATGGTTTTCCTAGNGGAGGATGGAATGGATGGGACCACCCACCAATTTGGTTCCCGGAAT 
TTGGTTTAAAA 

AAAAGTTTACGGGGATGATTTATTTCCAAACCCAGATGTTTCCTGCTGCTGAAAGAATTGGAAAAGCTC 
TTTTCAGTNAC 

AAT C T AAC T GAGAGAAC T TGAAAGGGATC AGCAT T T T T G T T AT GT C AACAT T TAAT GAC C AAT GAG C AC 
CAGCACGATGA 

TATTATTCTTAAATTTCTCGTTAGCGGTGTCTCACCATGGTACTTACATCTGCAAATTTACATGCTGTC 
AT ATAAACT T G 

GATTCTCAAATTTGTTTTTAGAGATTTATGCTCAACATTATGAATTGTATAAAGCAGATCCCATTTACA 
AAT TGCCAGAT 

AGTATGACATTGTTGAATGAAATAAGATCAAATAGAGATTATCCTAAAGTGGTAAATGCTGCAAAAAAT 
ACAGT AC AAG T 

CAATAATGTTTCATCCAAGAACAATAAAAAGAAGGATGAATGACAACAATTAGCCAATAARATTGAGGA 
AG T AGGACGT T 

AT AGCGAAAT AAAC GCAACAT C TAG AT AT CAT GAAAT T GGCGAT AC CAACAAAAAC C AAAGGAC AAT T A 
ATATTGAATTT 

G AAAAAT CAT ACAAAAT TAAGT GAACAAAAGAAG AAAACAAACC T AT T GGT AT ATGAT CT GGG AGCCAC 
AGTATCCGTGG 

TGAATGATAAGACTTTACTTAACGACATTAAAGAATCAAATATCGAAATTGCAACTGCTGAAGGGGAGA 
CAT CTAC GGC T 

TAT GC T T T AGGTAC TC TAAC CAT ATC T GTGAATGGAT T GAATGC GAAAT T AGAT GGT G T T C TAT ACT T G 
CCATCTATTCA 

ATTAAACTTAATATCTATAAAACAATTTGAAGATTTATGCTACGCAAT TTTGAT TTCCGAAAATTTAAT 
GTTTCTAGTTC 

ACAGT GAC CACGAACC TACGG T CAT TGC GAAAT AT T CACC TAAAGAT GAC TTAT ACT CAGGCC C AAGAT 
CGGGAAACTTT 

CT TAAGAAGAATCAT AATGAACAAAACCAAAT T TT GCT TGACACT G CT AAAAAAC TATT AGGATC AGAG 
AACATATTTCT 

GGAGAAATCAC T G AAAAATCCAAT GAT TGATCAAGGAAAAT T AGAT C CGT T GAAAAT GAACAATAAAGT 
AGAAAGAGTTA 

AC TAT G T CAGCATAC ACAACAT CAAACAAGAAGT GGC AG ACAAAT ATAT G AT AAAAG AT CT T TACT AC T 
AT CAT T TAT T A 

AT TAATCACC T T TC AC ATGAAAAACT ACAAT TAT T AGT AAAAAGGGGAG T GAT T AAAC CAG T C AAATC T 
ACTTCGGCTGA 

G TCGG CCAT T TTAAATT GTC AGAT ATG T GT T GCAGCCC AT GCAAAAT T AGC TAGC CAT AAT CACAC TCA 
ACAACGGGAAT 

T GGAGCGACCAT TAC AACGC CT CC ATT T GGATAC C GCCGGACCAT T T ACC TCAAAT AAAACT AAGAGC T 
ATCTTACAACC 

GTGAT TGATCAATTTTCCAGATATACTGAAGTTAT TGTATCTGACACCAAAGCAGTCAAACAAAGCAT A 
TTGCATAGACT 

TAGGGT CT GGAACAAT AGAT TT C AGT TTAAGAT CGCGGAGAT AAGAT AT GAT AAT G C AT TGGAG T ATCC 
ATCGGCTGAGG 

AGT TAGAG GAGT T AGGAAT T TATAAACAC C T T CTC CCAAAC T ACT C TCC TAT G CT TAACGGT ACAGCT G 
AAG C AAC C AAC 

CGCCCCATTGTCCAAGGTATTTATAAGGTAGTGTTAAATTTTAGTTGTCAAGTATTAATACTTTTCCCA 
TTTATAGTGGA 

GTAT GCGG T T C AT AT CC GGAATCAT ACACC TATAAAAGAAT T T G AT GG T GC T AC T C C T TAT G AAC G TT A 
CTAT GGT TTAT 

CTAAATACGTCATACCATTTTTTCAGTTTGGAACCGACGT TTTGAT AAAA.TGTGCTAGTGTACAAGAAG 
CTATTTCATTA 

AAACTACCATCTTCAAGAGATAAAGCTTTTCCTACAGTGATGTTTGGTGCTTTTCTCGGTTACGGCTCA 
GATTCCTTTAC 

C T T CAGAG T T TT AG T T T CC ACGAAAGG AT ATC CAGT T AT T ACAACATCAAAC AT C CGT CCAAT AG C GAC 
GATGCAAGTAC 

TCAATGACTATTTGGCATACATATCGGAGAATAGCTCAATAAGCTATGACGATACATTCTTATCACCTT 



TGAATCACCCA 

AT GATT CGCACAAACCAAC ATGATAGAC GT GGAGACAAT ATAAATGTC G AAT AT GAAAACCG TCC AAAT 
GTACCATTTGA 

ATATCATGCTGAACCTCCTCGTACAAATTCATCGACGGGAATTATCGATCGACCAGATATTAGACCTAG 
AGCTGATCCCA 

CCTGGCAACGTATGCCTGATGCCAACATACATCAGGAAACAACAACTGTACAGACTCCTGATCATGGGG 
AGTTAGATACC 

AT GATC AACAAC GAACACC AACT ACC AC GAT CTG GGGAGG GT AAT T AC C C C GGGCAACAGGT G CGCAC C 
GATATTATTGG 

GCAATT T CGAGATC GC GGGC CTACCACT C T AAAC ACT CC G AT CGAT CT AGG T G T ACC CG ATG AAAC AGA 
CGATATTAGTA 

TGACATCAGAGAATCCAATTGATTCCCCAAATTCCGAGATGATCATATCCCCATCTTTACCCACAAATG 
AATTGGAACAT 

CAAATCGATATCAGTTCAGGGGAGATGTCGTTATTGCAAACGAATATGGAAGCAGATAACGAATTGAAA 
ACAAATGAAAT 

GGTATTATACAAATCAAAAAATGATGGTATTATCATTCAACAACAACAATT CACT GAAAATTTGTCAGA 
TGAAAATGAAG 

AAGATTCATCAACAGATGAGGAAACATTGGAAGACAAAAAACAACAGCGATTGGAATATAATATTTCAC 
CAAACGATGAG 

TGGATAAATAATGACG TTCAGAACGAAGATGACACACAAG T GCCACAT G T T AAGG AAC C AAT CAATT AT 
GAAACTCAAAG 

T AGAAATGGAACAAACATGCCACGAATTGAAATGGGCATAATAGAAAAC T T AAGT GAT GAT GGAAAGAA 
TACACCACGTG 

AATTACGTATGGTCACCTACGATAATAATAAAAAAATTCAAAAGTACCAAAACAGTAATATCGAGATCC 
TGGAACCCAGA 

AACGAAAATAAAAACCACACATTCATTGAAAGCAACTTAGAATTACTTGACAATCAAGAAATGTTTCAA 
GAAGATCCTCA 

AGTTGAAGATATTCGATTGACAACTCCAAAAAAGGACAAATCGTTATCACCTGATTTCAATCAAACCCA 
TAATGAAATAC 

AACTATTCATGGCAGATATCAATGAAGATATGCTAGAAGAATATGATGAAAATATAAATATGAATGAAG 
TGTTAGCTGAC 

T CCAC GGAGACGT T GGACAAAGAAT TAGATT TAGATGAAGAAAGT GGAAGGAT CGAAT AT ATT GCT GAT 
AGAGT TAGAAA 

NAAGACAGAGGT AC T GAT GGT GCGCC AC AC GGGGAAT TNT T T ACAGAAAAATGGAT AAAGAT T T T T GGG 

TCCATTAAAAA 

GGCC 
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>retrotransposon_J3 POL protein 995aa 

MPOAIPTKTKGQLILNLKNHTKLSEQKKKTNLLVYDSGATVSVVNDKTLLNDIKESNIEIATAEGETST 
A 

YALGTLTISVNGLNAKLDGVLYLPSrQLNLISIKQFEDLCYAILISENLMFLVHSDHEPTVIAKYSPKD 
D 

LYSGPRSGNFLKKNHNEQNQILLDTAKKLLGSENIFSEKSSKNPMIDQGKLDPLKMNNKVERVNYVSIH 
N 

IKQEVADKYMIKDLYYYHLLINHLSHEKLQLLVKRGVIKPVKSTSAESAILNCQICVAAHAKLASHNHT 
Q 

QRELERPLQRLHLDTAGPFTSNKTKSYLTTVIDQFSRYTEVIVSDTKAVKQSILHRLRVWNNRFQFKIA 
E 

IRYDNALEYPSAEELEELGIYKHLLPNYSPMLNGTAEATNRPIVQGIYKWLNFSCQVLILFPFIVEYA 
V 

HIRNHTPIKEFDGATPYERYYGLSKYVIPFFQFGTDVLIKCASVQEAISLKLPSSRDKAFPTVMFGAFL 
G 

YGSDSFTFRVLVSTKGYPVITTSNIRPIATMQVLNDYLAYISENSSISYDDTFLSPLNHPMIRTNQHDR 
R 

GDNINVEYENRPNVPFEYHAEPPRTNSSTGIIDRPDIRPRADPTWQRMPDANIHQETTTVQTPDHGELD 
T 

MINNEHQLPRSGEGNYPGQQVRTDIIGQFRDRGPTTLNTPIDLGVPDETDDISMTSENPIDSPNSEMII 
S 

PSLPTNELEHQIDISSGEMSLLQTNMEADNELKTNEMVLYKSKNDGIIIQQQQFTENLSDENEEDSSTD 
E 

ETLEDKKQQRLEYTSilSPNDEWINNDVQNEDDTQVPHVKEPINYETQSRNGTNMPRIEMGIIENLSDDGK 
N 



TPRELRMVTYDNNKKIQKYQNSNIEISEPRNENKNHTFIESNLELLDNQEMFQEDPQVEDIEILTTPKKD 
K 

SLSPDFNQTHNEIQLFMADINEDMLEEYDENINMNEVLADSTETLDPCELDLDEESGRIEYIADRVRXKT 
E 

VSMVRHTGNXLQKNG 
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>retrotransposon_9 124 9bp Tca2-like LTR: 541-820 

TCTCTATGTAGGCTGACAGGTGAAAATTATGAATTAATTGCATTGGCCAATGACAAATGAATAGACAAA 
ACAGCAAATAA 

GGTTGCAAAAGTAGCCCAAACAAACTAGATTTCGGTTACGAATTTTCCATCTTTCAAAACAATGAATTT 
GTTTAGAGCTC 

TGTGCCATTTATTGCAACTAAAATGAATATGCAATTAAACAATCAGAGATGTATTGGATTATCCCCGTG 
GTATACTTTTG 

AGTTCACCATTTGTTTTTTTTTTGGGGTTAAATTAGTGCTCCTACTAAAAATCGCATTTATCTTACACT 
CACCATTTTGA 

TAAGTTATCTCTGGTCAATCGCAAATACTATGCTTCTAATTAAGAGTTCTATGTAAATCCCATTTAATT 
TTGATCAATCT 

ATTGGTTTGAAGTAAGAGTTGATTTTCTGTAAAGATTTATTTGGCCAGTGTAGTTCGGTGTCAAAAATA 
TATTATGATGT 

ACAC T AAAAAACAC T AAAT T TCAAGTCAATGGGGAACACAAAACT G AATT AAT T ACTATAT G T T GG TTT 
GTGCACTATTT 

TGTGTCAGAAACTGATCAATGAAAATGATGGTTATTATGAGAATGGAAAATTTTTCCATCACACATCAG 
GTGATGACAGA 

AC T AAAC TAT AT T G T G TAGT ATAAATAAG GGTAT GAAATAC CAACATCC CAGAAT ATCAACGAGAT AG A 
AGAGAGGAGTT 

TCAAT ATAT ATC TT GT GAAT AATAACTTCG T T CTAATTCAC T ATAC ACAAC T AGACGT GTAC AC GC TC A 
ATCTCAGGTAA 

AGAAAGT T TAT AT TC CAT CACTATATAACAACAAT CAGGC T T T G C AAAAAAAC AT T TAAAAC T AAT ACT 
GGTAATAT GGA 

AAT ATAACG CCTCG TAGT TC T ACGCACGTGGCAT C C TT T AT C TAT T TATTC AAT T TACC CCTAAT T TAT 
GAATTAGCTTA 

ATAAGAGC AGT C AAAT TAACACGGCTCAAT T AATAGTAC T TAAT AATAT GAAGCCGAT CAAT T AAC CGA 
TCCTTTGAATA 

AT T TGAAAATAAAAT AAAGT AAT AT AAATAGGT AT G C AT TT T CC CT ACAT T TAT TTCCTCTTTC TAT T T 
TAATTTGTTTC 

CTAAACAGCAACAACAAC AAT T GAAATT CAAAAAT GGTTTCTGTTT CTAAATTATT GAACAAT GGAT TG 
TTATT AGCT GG 

TCAAAGTGTCTTCCAAGATGTTGCTACTCCACAGCAAGCTTCTGTGCAA 
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>retrotransposon_10 5611bp Tca2-like LTR: 1136-1414 

T C T CT ATG TAGG CT GACAGG T GAAAAT T ATGAATTAATT GCAT T GGCC AATGACAAAT GAATAG AC AAA 
ACAGCAAATAA 

GGT T GCAAAAGTAGCC CAAAC AAACT AG ATTT CGG T T AC GAAT T T T CCATC T T T C AAAACAATG AAT T T 
GTTTAGAGCTC 

TGT GCCAT T T AT T GC AAC T AAAATGAATATGCAAT TAAAC AAT CAGAGATG T AT T GGATT AT CCCC G T G 
GTATACTTTTG 

AGTTCACCATTTGTTTTTTTTTTGGGGTTAAATTAGTGCTCCTACTAAAAATCGCATTTATCTTACACT 
CACCATTTTGA 

TAAGTTATCTCTGGTCAATCGCAAATACTATGCTTCTAATTAAGAGTTCTATGTAAATCCCATTTAATT 
TTGATCAATCT 

ATTGGTTTGAAGTAAGAGTTGATTTTCTGTAAAGATTTATTTGGCCAGTGTAGTTCGGTGTCAAAAATA 
TATTATGATGT 

ACAC T AAAAAACAC TAAAT T TCAAGT CAAT G GGGAACAC AAAAC T GAAT TAATT AC T ATAT GTTGGTTT 
GTGCACTATTT 

T G T G T C AGAAAC T GAT CAAT GAAAATGAT GG T TAT TAT G AGAAT GGAAAAT T TT T C CAT CAC ACATCAG 
GTGATGACAGA 

AC T AAACT AT AT TG T G TAGT AT AAAT AAGG G TAT GAAATAC CAACAT CC CAGAAT AT CAACGAGAT AGA 
AGAGAGGAGTT 



TCAATATATATCTTGTGAATAATAACTTCGTTCTAATTCACTATACACAACTAGACGTGTACACGCTCA 
ATCTCAGGTAA 

AGAAAGT T TAT AT T CCAT C ACTAT ATAACAACAATC AGG C T T T GCAAAAAAACAT T TAAAACTAAT ACT 
GGTAATATGGA 

AATATAACGCCTCGTAGTTCTACGCACGTGGCATCCTTTATCTATTTATTCAATTTACCCCTAATTTAT 
GAATTAGCTTA 

AT AAGAGCAG T C AAAT TAAC ACG GCTCAAT TAATAG T AC T T AAT AAT AT GAAGC C G ATC AAT T AACC G A 
TCCTTTGAATA 

ATTTGAAAATAAAATAAAGTAATATAAATAGGTATGCATTTTCCCTACATTTATTTCCTCTTTCTATTT 
TAATTTGTTTC 

CTAAACAGCAACAACAACAATTGAAATTCAAAAATGGTTTCTGTTTCTAAATTATTGAACAATGGATTG 
TTATTAGCTGG 

T C AAAG T GT C T T C CAAGATG T T GCTACT CCACAGCAAG C T TCTG T GC AACAAT AT AAC AT CG T C AAT T C 
TCTTGGCGGTA 

GTGCCCCTTATATTCAAAGAAACGGATATGGGATTTCTACTGATATCCCTGCTGGTTGTGAAATTGCTC 
AAATTCAATTG 

TAT TCAAGACATGGTGAAAGATACCCAAGTAAAAGTAATGGTAAAAGTT TAGAAGCAAT T TAT GCTAAA 
TTTGAAAACTA 

CAAAGGTACTTTTAAAGGTGATTTGGCTTTCTTAAATGATTATACTTATTTTGTTACTGATAAAAACAA 
TTACGAAAAGG 

AAACTAGCCCAAAAAATTCTGAAGGAACCTATGCCGGTACAACCAATGCCTTGCGTCACGGTGCTGCGT 
TTAGAGCCAAA 

TATGGATCCTTATACAAGGAAAATTCAACATTACCAGTTTTCTCTTCCAATTCAGGTAGATGTTACCAA 
ACTTCAAGATA 

TTTTGCTAGAGGATTTTTAGGTGATGACTTTAAAGAAGGTAAAACTGTCAAGTTTAACATCATTTCTGA 
AGATGCT GAT G 

TTGGTG CCAATAGT T TG ACT CCAAGAAGT GCAT GT TCC AAGAACAAAGAACGG AGCAGT AG TACT GCCA 
AAAAATATAAC 

ACAACATAT T TAAAT GCT AT TGCT GAAAGAT T AGT TAAACCAAACCCAG G T T T GAAT T TGAC T ACAAG T 
GATGTCAACAA 

TTTATTCAGTTGGTGTGCTTATGAAATCAACGTCAGAGGAAGTTCACCATTCTGTGATTTATTCACCAA 
TGAAGAATTCA 

TTAAGAACTCTTATGGTAATGATCTTTCCAAATATTATTCTAATGGTGCTGGTAATAATTACACCAGAA 
TCATTGGTTCA 

GTGATTTTGAATTCATCCTTGGAACTTTTAAAAGACACCGAGAACTCTAATCAAGTATGGTTATCATTT 
GC T CAT GAT AC 

TGATTTAGAAATTTTCCATTCTGCTTTAGGATTATTGGAACCAGCTGAAGATTTACCAACATCTTACAT 
CCCATTCCCTA 

ACC C AT AC GT CCAT T CT T C T AT TGT T CCACAAGGT GCC AGAAT AT AC ACAGAAAAAC T T CAAT GT GGAA 
ACGATGCTTAT 

G T TAGAT ACATTAT CAACGATGC TG T CGTGC CAAT T CC AAAAT G T GCT AC T GG T CCAGGG TT C T CT T G T 
AAACTTGATGA 

T T T TGAAAAT TT CG T T AAAGAAAGAATTGGAGAT GT TGAC T T T AT T AAACAAT GT GG T GT CAAT AGT AC 
CTACCCATCTG 

AGCTT AC T T T CT ACT GGGAT T AT AAAAATGT C AC T T ACAATGC T CC TT TAGAAT TG TAAG ACAT CAT T A 
GATCAATTTAG 

ATATCCAAACATTTATTCGTTATTCTCTTCGTATATTATTTATATTCTTCCTTTTCTTGAA?^AAAAAAA 
TAGACAATTTA 

T T T AGACT T TATAAC T TTT AC T T CGTGT T GC AAC AAAT T G AGCAT T TTAC AC GAAACT T T AA&TAAT T G 
AATCCTTCGAA \ 

AACCAAAGT T T TATT G GT C GAC GGGT TG GT T AAC AT GGAAT AT AT C ACT T T C T AAT AAC T AT GT C AC AC 
C AAC AAAT AT C 

AAT AT GAG T GT T T C AGAC AAATACCCAGAACT T GT T AGAC AAT TTT TCC T TC TTGATGAAGT GAAG GAA 
ATTTTGCCGAA 

CTATCCAAAATACAAAATTTTACTGCAAACTCCTGAAGTCGATCGTGAATACTACAAAAACATCACCAG 
TCCTGAATTCA 

T T AGACAAT G GC AGCCAG AAGT C C TCAAT C ACTACC G AAAT AACT GG AC CGAAGTC ACTC C TCTTTGTG 
CTATTGTACAT 

GAT AGAACCAT T GAT GC C GG T T T GAGAAT CC AAAAG T T T TT C CAT C C AT CCAT CT T AC CGAAT GAAC T T 
CATGGCGATGT 

T T G GAT AC T G G T AAAAGAGAACAAAG AAGAAC T CGAT GCCTT T AT AGAAAAT GT GC AAT G T C T T CAAAA 
TTATGTTAGAG 

AT AGC T CC AAC AG T AAATACAC T TAT T ATCGT T G TGAG TAT T GC AAAAAG AAT AAAGGT G T T AAAAG T A 
AAAAAACTGAT 

T GCAAGCAT AAAATT GCAG T ACAT GCT CT TGAAGG T G GAAAATAC AAAAT AG TCTG GC ACT T T C AG CAT 
AACCATGCTTT 



CGATCCAAGAAGGATTACAAAGGCAACCAGAAACTGGTTGATGGACTTAGCTTCAACAAATATACCAAG 
GGCAAGTTCTG 

ACAGC AGG AG AT CAG TGACT AAATT C AAACT G AGT T C AT TT TT ACT T T CT GACAAAT T T AAAAT T T C C A 
ACAAGGTATTT 

AATTATTATAAAAACAAAAATAAAGAGAGCCAGGCACATCT TGACAAAAATGTT ATCAAAAGT TTAAAA 
ATATGGGTTTC 

AT AT ATAAAT ACCC T T AAT GAAT TTGCC GTGTT TAAAAAGAGAT CAACAAATACTG AAAAT GNT GAAT T 
CTGTGACGTGG 

AAGGCGATGCTCTGAATCCTGAGTCTACGTGGTATTTTGGAATTATTCTTTTGAGCAATCTCCAATATA 
TGCTGAGCCCA 

C AAAC T GT TT T CC T T GAT AG T ACACATAAATT AG GCCACGGCCCT CAC AACGAGG AC ATAAT AAC AT AT 
ATCTTTATCAC 

AAAAAGC T CT T TAT C TGGAG GAGG GATACCAATAGGTT AC T T AAT AACAAAT AGAG AGTC T C AT G AGC C 
GTTAGCATCAT 

TTTT GAGATT T T TTGTTGAAAAGAAAATACAAATCAAAAGATT CGTGATAGATTGTT CAGCTACTGAAA 
TAAAAGCT AT T 

GAAGAAGGATATAATGT TGGTATCATTGAACCCACAGATGGATCATCAAGTGCTGGTGAT AAATTT GAA 
GCTATCATAAC 

GTTTTGCACTTGGCATTGTTTGAGAGCTTTTAATAAGACCATTAACAAACTTATTACAATACAAAATAG 
AACAAATAATG 

AGCAAATATCCCCAAATGAAATTATCACAGAAGTTGACGGAGAAATGACAGATGAAGAATTCATAAATC 
AGATAGCCACT 

CAAGGGGTTGTTGCACAATCAAACTTAACTGCAGGTAGGAATAAGGAAGAGATAATTGCAAATCAAAGA 
ATTGCTCTTTC 

ATATATGGTAGAATT AAAACGGAAAAAAGCCATTGAAGAAGCTAAT GATT TTTTGCATGT AATCGAAGC 
CACGTTTCGGG 

AATACCCGGACTTTGTTGCATACGCCCAGAAAACATTCAAAACCACAGGGAAATACTGGTTAAACTGCC 
ATTTTGGTAAT 

TACAGAGAACTTACAAATAATTGTGTGAAAAGTTATCACCAAGTTTTAAAAACCAAATATTTCGAAAGA 
CGCAGAAAATA 

CCGAGTTGACCGAGTAATTTGGATGTTTATTGAACCCATTGCTAAGTACTATGAGTATTACCATTCAGC 
TGTTATTGTTA 

CATCCCTGTTAAGGTACATTGATAAAGCTGAAGAAGCTTCCAAACTCAAAGCAGAAGCAGTTTCAGATG 
AGGACATGAGG 

C AAATG AT T G T TGACCTT C CAGGTT AT ATTGC AGT TAAAT C GT T CAAT GGAT C AAAT TAT T AC AAG AT T 
AGTTTTGGTGA 

ACGTGGAATCTTTTCCTGCGAATGTCCGTACAACGAGTATTCAATTGATTGGTGCAAACACATTTTCTT 
ATATAAGCGTT 

ATAAGGT GGC T AAAGGAT T GGACAT ACCT AT T GTCGAG CT T GAAAG AAACC CT T T GGC TGACT TAAG T G 
GTTTAAACGGT 

ACT AATGAGAT AGT T GAAC G AGAAACAGAT AC T AT TGGAAAT GAAT CAGAAGAC GAAGAAT T AGT T GAT 
AGTGAGTCAGG 

ATTTAAGAATGCCACCTATAATGAGAGTGATTTTGGTGACGATAATTTTGATTCTATGGAAAATGATCC 
AG AT G GT G AC G 

AACCAGAT T T T AG TATTGAAAACACAGAACCAACTGAAGT AT C CCAAGAAGAGACCGAAG AAG AAAT T G 
GTGCCAGGCTT 

GCAC G CGAC AGAGT T GATC C T GGGT TCT CCATAGAT GACGACAATAT TGGAAAC GAC T TCGAAC T C GC T 
GACTCTTCTCA 

AG T TT TTAC AGACGGTGGAAC AGCT TAT TACACAC AAAAC ACAGAATCAG AC CC ATT T AT T GAATGG CC 
T AT AAGT G AAA 

CAAT T GAT C T GC AAGAAAG TGC T GATGT T AT TTT AGAAAT CGAAAGCAT AGAAGGGG T T T AT GCT AAG A 
AAGCTGCTAGA 

AATATTAAGCAACGGGAAGAGAATTATAGTAGTTTAGATACAGAGGTTAAAAGAATTCAAGATGAGGAG 
AAATCTCAAAG 

GGAGAAGGTTAAAAAGCTAAGGGCATTAATTAAAAAAGAAGAGATGGAACATAAAAAGAAAATGGCGGC 

AG T GAATAGGA 

TTCAAAAGAAA 



FIGURE 28 

>retrotransposon_ll 1308bp Tca2-like LTR: 136-416 

T GGT GCC AT TTT T AGAAT T GAT GTCTGAAAT AGAAT AT G AG GT C C AGAGAAG TTT TAT T T T T GT TAT AC 
ATCATTTTTTT 

TTTTTGCTTTGTCTCACCGAATATTATTTGATTCCTAAAAAATTGTAATACCCTGTGTTGGTTTGTGCA 



CTATTTTGTGT 

CAGAAACTGATCTATGAAAATGATGGTTATTATGAGAATGGAAAATTTTTCCATCACACATCAGGTGAT 
GACAGAACTAA 

AT TAT AT T G T GT AG T AT AAT AAAGGGTAT GAAAT AC C AACAT CCCAGGAT AT CAAT T AT AT AGAAGGG A 
AGGAGTTTCAA 

TATATATCTTGTGAATAATAACTTCGTTCTAATTCACTATTCACAACTAGGCGTGTACACGCTGAATCT 
CAGGTAAAGAA 

AGTTTATATTCCATCACTCTGAAGTCATACATTAATATTAAATAAACAATCTAACACTAGCATGCATTC 
ATAACCTATAG 

AT CAT T C T AAAC AAGC T GT T AACAC AAAT CCAAT CAAT T GAATT T ATCAT AT AAT GAAGTAAC t T T T T T 
CAAGGCAACAT 

CTATTCTTTTATTAATCTCGACGTCTGTTTGATTAAGTTGCTCTAACATTTTATTTAGATCCTTCTCTA 
TATTTTCTGCA 

AT AT CAAACACC GAT TGCTTTTTGTC TGAAGTT GC TGG TAT AT C AC C ACT TC C G CCAAT T G T CG T AT T T 
CCACTGTCCTT 

T G T TACT GAC AGATT G G C AC TGACAT T ACC T GAAT T GT T C AT GT T T GC T GT TGAAAGAGCAGGAACT G T 
ACTTGGATAAG 

CAGCCGATTCAAAAGAAGATGTGGACATGAGTGTCAAGAAAATGTGTAGAATCAGTACAAGACTGGAAA 
ACAGAAGGAAC 

AAAGTGAACTGGATATTGTAGTTTTGTTGATAGTACTCGCGAGCTTTAATTTTTTTTTGTAACTGGCGG 
AATCAGATCTT 

AT GC AAT ACT C AAAT CCAAAGAAACAGT CAAT C CAGAT GAAAG GC AT GT AAT CG C TAG T TT T CAT AAAC 
AGAATCATGTT 

ACTAG T CAT AT T TTC T ATAAAAAT T CAAT ACT T CAT TCTTTTTGTT CAAT AC T AAC TAT AAAT GC T TAC 
AAATAGATTCA 

AATTTCAACCAGATCCACCACTTCATTAGGCTCAACCAATTCTTCATAAATAGAAACGTCTTCCTCAGC 
CAAGCTTAATT 

GAT GG GAAAC CCTAGC T TGCATTGAAG GAAAAATACAT AATCCAAATAACAAAC T GT C T TT C CNAAT AT 
TCTCAAAATTC 

GACTTCACCGTCTTCCAACCAAGCAGGT 



FIGURE 29 

>retrotransposon_12 1672bp Tca2-like LTR: 134 6-1533 

CCTATCAGGTACTTCCCCACTTGGATTGGCTTCTGCCTCTCTTCTTCTCCCAACCATCATCCCAATATC 
ATTCCACCCAT 

CGTCTTCATCGTTGTCGTCTTTTGTTGGTNTCTCTTCTTGTTTTTCTAGTTTACCACTATAAAAATCAA 
TCAATTCAGTT 

TG T T T TAT G GCAT CAGAT T TAT AAAT T T T TTTAAT TTTATCAACAT AAT TAT CAACAAT CCAAT CAAG A 
TGTAATTTATT 

CAAT TTTTCTTG T AAAGAAT CACC ACCAC CAT T TC CTAT TCC T T CC AT T CT T GAT AAT AT AT TCCAAT T 
AGT T T CAT GAC 

ATAATTTCGTTAATTCATCTAAATCATTCAATTGTTGTTTATCATTAATAATTTGATTTATATTGATGG 
AAATTTTATCA 

AT TAAATTTT TAGAAATTTTAGAATT TAAATAATTTTTGATTATAGGATAT TGTAAT TCATTTATAAAT 
CTAATTAAATT 

AGT AAT TGAT T T AAT AAAAT T G T T GT CC T CGT T GT C TGAT ACAAT T TCT AAT TT AAT AGT AT CT T C C AA 
T T CAT C AAC AA 

TCAAACTAAGTTGTTTTGAAGGGGTGGGGGTGGAGTCCCCCAATATTGAATCCACTAATTTATCCCAAT 
TTTCCTTATAT 

T TAT C GT AT GCAT T CATAT TAT TAT GT C CAT T T T T CAAT AAAAAC C GATTG AAAT CT T GT AAAAT T GC T 
AT AT T AGT AAT 

AGT CAAT GGAT CAGGAAT T AAAAGAATAGTTAAATAT T CAT TCAAT T GATT AAC AAAATT T T C ATAAAG 
TGAATCGACTC 

GT T T C T TGAT T TG T T T ATAT ATAAT ATAT TG AGAAT TTGTAT CAAT GAT GAT T T GT T T AAAT AAAT TAT 
TTAAATATTGT 

AAAT CTAATATACTTTGTAATGTTTTCGGTTTCCCCAAATACGT TTC AATTTCTTTTAATTTAGAATTG 
ATCTCTTGTAA 

TTCATTCAATTGTTGTAAATTGTCAGTAACGATTTCAAATTTATTATTCAATTCAGTAATTGTTAAATC 
AGTTAAATTGT 

TACTTTCAGTGGTATTTGAATCTTGAGGAATTTCTTCAAATTGTTTTCGGAAATCATTATCATTTTCAA 
GGGTTGTTTTG 

TTTATTTTGGATAATGTTTTATTTATGTTCTGTTCAATATCTTTTAAATATAATTCTTGATCTTCTAAT 




TGTTGTTCAAT 

CGATGGCTVTTATTGGTGTTGTATAAAAATGGAATTTTGTAAAGTTGAATGTGTTGGCAACACTTGTGTT 
TGTATGGGCGT 

ATATTTTTT GAGGAGATCAAAGCAAAAAATATTTTGAGACTTATACACGCAACATACAGAACAGTT GTT 
GGTTTGTGCAC 

TAT T TT G T G T C AGAAAC TGAT CAAT GAAAAT GATGGT TAT TAT G AGAAT G GAAAAT T TT T C CATC AC AC 
ATCAGGTGATG 

ACAG AAC T AAACT AT AT T GTGT AGTAT AAATAAGGGT AT GAAAT ACC AAC AT CC CAGAAT AT C AAC GAG 
GATAGAANGGG 

AN GGAGT TT CAAT TANAATAAT CCTGT NGAATAAAT AAACT T C CGGNTC C T AAAT TCNNC T AAT AC CNA 
CCAAACCTTAG 

NACCGTNGTAACANCGCCTCCAATCCTCCANGGGAAAAAGAAAANGTTTTAATAATTTCCCNATCCCGG 
ATT 



FIGURE 30 

>retrotransposon_13 690bp Tca2-like LTR: 4 64-690 

TGATACGATTGAATGGTGGAGACAAAATATCCGATGTGTTGAAAGATAAAATTGTACTCGAATATCCCA 
CAATATATGTT 

GCTGCAAATGACGAGTGTTTACAAGATAGAATTATAGATAGCCTTCAATTGGCCGAGGAGGAAGAAGAT 
GACACCACTGA 

CTCAAGTGAGGATGATTCTAGTGACTCAGAGAGTGATGATGATGATAGTGATAGTGGTAGTGAAACCAG 
TAGTATTGGAG 

ACGG T T C AGGT GAAGATAACGAT TCT GAT T CGGCACCGG AAGAGACATCT CTGAAACT AC C ACCT T T TT 
CACAGAAATTC 

T T T GAAGCGT C AG C TGAGCCAAAACCAAT AAT AGAAGAGAT AGGAT CTAACAAGAC T G T AGAAGAACC A 
T AACGAAT GAA 

TATAAAATACTTGTATTATGTAGTGCCAATAAAAGTTGAAACGGTCGCACTACTTTTTAGTCCTGTTGG 
TTTGTGCACTA 

TTTTGTGTCAGAAACTGATCTATGAAAATGATGGTTATTATGAGAATGGAAAACTTTTCCATCACACAT 
CAGGTGATGAC 

AGAACTAAACTATATTGTATAGTATAAATAAGGGTATGAAATACCAACATCCCAGAATATTAATTATAT 
AGAANGGAAGG 

AG T TT AAT ATAT AT CC T GTGGAATAACAAC T T CGGT C T AAT T CACT AT AC 



FIGURE 31 

>retrotransposon_14 1912bp POL protein: 1169-18 39 

CTAGGTTTTAATTCACTATCATAAAGATCAATGGTTAGCCCAAAATTAAAATATGGAAGCCAAAACTTC 
CGTGGTCAAAA 

AAT GAAC T AAGAAGCT AAAGT CTT TT T GAAACAG TAT GCC AT TATG T T T TTCAGAT G TT T T TACT T GGT 
TGTTATATTAA 

AATCCAAAGC T CT GGCT CTT AT CAAGAAT T T GTCAGTCAACT CATC ATC AAAT GAGT GGAT ATAT TACT 
TTCAAGAATCA 

TCATTACCAAGTTGTCAAACGATTGCTAAGCAAATGTTGAAGAATACTGATTATTTCAGTTTTGAGAAA 
CCTAACCCCAA 

AGATAATTTAAGGAGAATCAAAATTTGAAAGAAAAGGATGAAAAGTTGGAGAAAGAAACCCTATTGAAA 
AT T TAAGT AC T 

GATTGTTTCAGAAAATCATTGAATATGAAACAACAGAAAGGATATTTTACCAACTAATGAACATTTTCC 
TCCCTTATACA 

CCTTAAAATACATTAAATCCTTCTGGAATAGTTTTTTCTCACAAGACATTTTGGTGTATAACATTGGTA 
CTATTGTTGCT 

GTCATGACAAATAAGGAATGCTACAAAACGTCAAGGTAGAAGCTATCGATGTTTTTTCCAGCTAATGAC 
AGGACAACGTT 

AGAAACG AAGT GT G CAGACGAT T T GGT T AC AAAGAT T GC AAG T GTAT CAAT TAT GC T AG CAT AT ACC TT 
ATATTTTCGTT 

GAGAGTATTTTTATCATCGTTGGTCTGCAAAACTTCAAAGAAGGGGTGCTATATGTGTTAAATGCTGAG 
AAT CGAACAC T 

GTATCTCATGGCGATAAAATTCAAAATATTGTCGTTAGTATGAGAAGATTTTGCTGATATTTACTTATA 
TTTCACAATGT 

TCAGTAAAGATCCTTATGACGGTGGTACAATATGGGACATGCTATCTGACACGTTGACAACCACTAAAA 
TCAGCTGTTAC 
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C GAT AGAGAC C AT ACAGATT GACGCAAC ACAT AAG T AT AC TCGAAAAGCT AACC CACC AT ATC AGGC AT 
CAAGCCAAAAA 

TCAATTTTGACTGAAAATGGACGTCATTAACTCTGAGTCGCTAAAATCAAGGTATGAAATATTTGCCAA 
AGAGGAAATCG 

ATCAGAGTCGCAATTTCTGTTCAATATTCAACCAAATACAATTTTCCAACCTATAAATCTCCACCATCT 
GTGTTATGTGC 

TGTCATTGAGTTTGCAACTGATATTTTTGCTATATCTTTACGTTGCAAAATATGCGGGGTGATGTTAAA 
CTTACCCGAAT 

TCT C CG T GTAT CAC ATGTT AT T AT GCC AAATATGCAT AT CT AGGAAAAC AG TC T CAAC C AT CT AACACA 
CACATTTTCTC 

ACC ACT GAAGC TAT GAAGAT AGCCC AT T C GGGAACGG T AAACGACGTAGCG GGAAAAAT GTGCT TAAAA 
GAAT AT GGGAA 

AAT AAAC GGG TAGACG T CAT T TCCC AG T ACCAT ATT CT AT T CAGT CGAAC G TCTT CAT T CT TAT CAACG 
GGGGACTGGTC 

C AGAGAC CTT T CT TAT T T TAT TGTGAT T CAGTAGCG T CTACCAT AT ACAATGAT AT T G T AACT T CC GAT 
CAAGTGGAAAC 

ACC GGGAGCT TCCAAAG TAT GGTAT CCGAAT ATAAAG CCACCCAAAAT CCAAT T CACC AC GAG C T AAC A 
CCTGGGGAAAA 

CGAGGTGTCTAAACCTCCTCAACTTGATTTCGAGACTTCGGTAGTAGGGAAGTTTAAAGGGCCTATTAC 
AACCACAAAAG 

TGGCACCACCACCCTCCATGGGAGGTCTATTAAGTACATGGAAACGCATGCTCTGGTTGATACATCACC 
TCAATCAAAAA 

AATTGGTGTTCCACATTCGAAGGAACTAAAACCGACGAGAACCTATCACACGGTGTCGACGATGATAAG 
AAA 



FIGURE 32 

>retrotransposon_15 614 Obp POL protein: 1555-4302; LTR regions: 979- 
1292, 5212-5525 

AGT AAAAAAAGAAGAAAAAAAAGCT AAAAT T GGGACAAT AT GC TAAG T ATAT ATAGGG G AAGACGT CGA 
ACAGCAACCAC 

GGAAAAATAATAGTGATTGTCTTTATCCGTTATTGGCTGGATGGCGACGCCACAACCTGAAATTTGGTT 
CCAACTGTTGA 

GGATGATTTATGTTTGTGATTAGAACTAAAATCATTCGAGAAAAAAGGAATAGGAGAGAACCAACTTTA 
GTCGTGTAAAA 

AG T AACAT C T GCCAAT TAT AAAC T AT AC GT AGT CC AAATAAT T T ACGG T AT AT T TCT GTACC CCTT CTT 
GGCAAT AT CAC 

AAGAAT AT C AT AAT GT T CATGAAC CC TC T TTGAAC AC G TAGACAAGT AAAC C C AAT GAG GGG GC AG T G T 
TCTATTCTTGT 

AAACTGCGCACCAAAAACGGGGCTTAAAAAATAAGTTATGAAAACTATAAATAACCATGAAAATCACCC 
TACTCCCTTCC 

TCCCTTCCTTCCTTCCTTCCTTCCTTTTCTCTTTTCCTCTACCCACACTACTCACAATGTTCGGTATTT 
T TG AGGAAAAC 

TACGATTCTGTTTACAAAGGCAACCACGAAGCCAAGTTCTCTCACGAAGCAGTTGCTGGTGCTGCTTCA 
TTTGCTGCTGT 

CAAGTTGT T TGAAGATAGACAAAGAAGAGAAGGGAAACCAGTTAGTCACGCCTTTGCTAAAGAAGCTTT 
AGCTGCTATTG 

CTGGTGGAGAAGTCGACAAATTATTTGAAACCAAAGGGTTGGACTATTTGGATAGAGAGAGACTTAGAG 
ATCAAGCTATC 

AAC AAC GC T CAAAG AGG T TACGAC GACCAT TACGG T CAACACGAAGAAT GGT C T CCAGAAC ACAGAC CA 
CCTTTTGACTA 

CCAAAG AT AT TAAG T AGAAAC TG T GTAGTG AAT T T ACAAT T T T T TT GACAAGAAT TAAC T T AAACC T CG 
TTTTTAGGTTT 

TGTGCGGCTTTTGTCAATTGACGATCCTGTATATTTCGTCATAATTCACACATTCTTAAAATTATGCAC 
ACATCCTTGAA 

ATGTGTTAATATTCCCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAAGTTATCAACTCAATT 
CACGCTATATA 

AACCTTACAAATTCTCTACATTTTTATATTTTTTTATATTGGCTTTTCTTTTAGAATCAATCAATACTT 
TTTTTATCATT 

T AGATACAT CT T T CAT C TAT T AAT AG ATT AT C T T T C TAT ATAT C AAAACACG AC AC AGTCAC G T GCCAA 
AAAGGATATAA 

G AAGGAACT T CAGAAAATTAAT T T TC T GAT TAT ACT ACT T AC TAG AT TT C AT AAAG T C AAT AT C T G ATT 
GATACAACTTG 

G T T C AT TAT T C AT AAAACT T T ACAACT AAT T CNACAAGNAAAC CCNACAAAAAAAT C CNAATNAAAT AA 



TCNNNNNAATA 

TTATAATTAATTAATTACAAAAAAAAACAAAAAAATACACACACACATACACACACACAAAATCTTGTT 
GCAAAAAAAAA 

AAAAT AAT AAT AAT AT AAT AAGAAT TAATTAACAAT GT C GT T T CC AC GGAC ACAT T C AC C AAGACCAT C 
TGGTTCACGAG 

AACAGGAAGATCTCACACTGATGATTAAAGCT T TTAGAGATT CAATGGAAGCTAAGCT TGACTTGCAT T 
CGCAGAAGCTT 

ACTGCTTTGGTAGCAAACATTCCCAGAACGGACGAAGGGTTTGAAGATTTATCACAAAGGATCACTGTT 
CTTAAAAATCA 

T C AAAAAG C ATT T T T GCC C AAAC AAGAAAAAGAAATC GGAAGT CT T C T C C ACAG ACAAAG AGAGG AAGA 
AGGTGATATTA 

AGGAT T T CAAAAC AGTC G T T GGT GAAGAAAAAGAAGAAT T GCACC AGGT TGAAGAT T T C G TT T T AAAAG 
ATCAAGAAGAA 

TTACGAAACGTCGAAAAGAAAGTTTTGAAAGAAGAAGAAGAATTGCAAAAAGTGGAAGAGTCAATGGAA 
AAGGAAAAACA 

AGAGT T AT ACCAGG T TGAAGACTT T ATTT T GCAAAGAGATGAGACGGT AAAGAAAC T T GGAGAAAG CAA 
TCAATCTCAAC 

AGGAACCATATACACCTGCAACTTCTGGTTCGGATCAGAGATTCAGATCTCAACAACCTAACATTGGAA 
ATACCTTAGCG 

CAGGATCTAGCATTAATTCCAAAATTAGATCTGGAAATTTGCAAAATTGCAGTCAAATATCCAAAATTA 
TTTGAAACAAA 

AT T AAGACCACCACC ACC CAGAGAC T T T C AATATAAAAT T C AACT CAC AGACC ACACT C AAAT T T AT TC 
AAAACCATATA 

AAT GCAAT CAAGAAGAACAAGCTCTCATC AAGGAT T TCAT CAATG AAAAATT AGAAGCAGGC G T T T T GG 
TACCAGCTCCA 

AT T GAT GCT TGGTT ACACCCAAT AT T T CC AATCAGAAAAAC CAATGCC AACCAATC CT CCACCAAAAT A 
GCAGTTGATTT 

AAGACG T CT CAATAAGGTCACAG T AC GAAT GT ACACT TATCC AACAGACACAAAAGACC T CT TATCC T C 
ACTAACAGATT 

CCCACTATTTTAGCGCTTTAGACTTAAAGAATGCGTTCTATCAGGTAAGCATACACAAGGATAGTATAA 
AATATTTTGGG 

AT T T CAACATCC GAGGGGAAT TATTGC T T T ACAAC TTTACCG T TT GGAGCAAT C AAT T CC CC AACC AT C 
TTTACTAACTT 

T G T GAGAC AG ATT TT AGAGGGGATCCCAT GTAT AT T T ATATAC AT GGAT GAT ATCC T C AT CC ATAC T AA 
AACCT T AC AT G 

AC CACAT G T CAT TACT C AGG AGAAT CAT GG AGAAACTAAAT GAGCAT C AG T T T C AAATG AAT T AT AAC A 
AGATGCAATTA 

T TAACAACAAAAATC AAT TT CT TAGGGTACAG CAT T C AAGCGAACAAAAT AT CACCAG AT AT T T C C AAA 
AT TCAAGCAAT 

ACAAAATTGGGAATTGCCCACGACCACTACTCAAATCAGAGCATTTGTCAATTTCAGCAACCACTTTCG 
CATCTTCATCC 

CAGAAATAGCAAAATT TACT AAT CCATTAAATGAATTAT T GAAG AACAACAAT GGT AAAAACATAAAGA 
TTGAAC ACAC C 

CAAGCATCCATTGATGGTTACAAGGCATTAAAAGCCGCCATCATTGGATTGCCGACGCTTCAACTTTAC 
AAT CCAAAACT 

ACCAACCATCATTTTCACAGATGCTAGCCACATGGTAGTAGGAGGATATTTATGTCAACCAACATTCAG 
AAAT G AC AAAG 

AAGTCCTTGTCCCAATTGCATTTTCATCACATAAATTAACAGAAACACAAAGCAGATATGCTGCT AT GG 
AAAAGGAACTT 

TTGGCAATTATTGTGATATTGGAAAAATTTAGATATCACTGCAGCAATACGGTAGAGATCTATACAGAT 
TATCAAAGTTT 

GGC ATC AT AT T T AGAT AAGAAAACT ACTCCACC ACCGAG AAT T GC T AGGT T T T TAG AT C T AAT TGG ATC 
ATTTTCCCCAA 

AAG T GT AC TAT T T AAGT GGAAAGAAA&AT T TCGT T GCTGAT AT CATTAC AAGAT AT CAAAC T C AAAAT A 
TTAAGGAATTG 

GT AG AT G AAGACAAG ATACTAGGAC AGAC T T T TAC AGTCAAGAGAAAT T TGAAAC AAC AAC T ATT AC CA 
AGAT T GGAAG C 

AAT T GAAT T G G AAAAT C T T AAT GAATC ACAGGT T CACAAAAT C CAAACT T CAT T AGAACAAC AACAACA 
ACAT GAT T T G G 

AAGACAATGATGAAGAGTTACCTCTCCAACTGTTTAAATTAATGAATGATGAGTTATTTGTAATCATTA 
ACAAC C AAC T T 

TTAAAATACCTTCCAAGACTGGAATACAATGATATTTGTCAAACAATCCATGACAAACACCATCCATCA 
ACTAGAGTAAC 

AGACTACTTATGCACACTCGCATATTGGCATCCTGACCATCTATTAATTGCTACAAACATTACGAGAAA 
GTGTCACTATT 

GTCAACTAAACACGTCAATTCGTGAGGCCATTAGACCATACCGACCACTTGAACCACTCAAGGCATTTA 
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GCAGATGGGGA 

AT G GAC T ACT CTGGAC CAT AC T T T AAC AC AG TCC AACAC AGGT ACATAT T AG T AGC CG T GGAAT AT G T C 
ACTGGTTTAAC 

TAT T GCAGTACCAAC AT T GCACAAAG AC GCAG AT AACGC AATCAGT CTT T T ACAAT CAAT CAT T CT GAT 
CATGTCAGCAC 

CTACAGAATTAGTTACAGATCAAGGTAAAAAAATTTTCATCACAAGCTTTGGCTACCCTATGTGACCAG 
AATAACATACA 

ACACCATATTACCTCCGCCCACCACCCACGTGGGAATGGTCGGGTTGAGAAGGTGAACCACCTATTGAA 
GAAAATATTGA 

AAGCATTAACTAACGATACGATGCAAGACTGGGATTTAAAACTATATGACGCTTTAAGAATCTACAATG 
CTACACCTACA 

ATTTTTAACTACACTCCACTTTATCTTGCACTTGGAATTGAACCACACCATAATTTAAATCAATTACAA 
AAAGATTTAAT 

T G AAAAT T T GC AAAAAGAAT T GC CCCCAGAGGT CC AAT C CACAGAAGAAC ACGAAG AAAACCC AAAT GA 
T GAAC AAC AAG 

AAG AGGGC AGAGAAC AACAAATT T CAAGAG AAG AACAACAGGACG GCAG AGAT CT T G T AC ACT T AAG AA 
TTTACGAATTG 

GAAG CAAT TAAGAAAGCT CGCAAGT TACAC ACAAAT T T GAAAACACGAAGAAACGC AG T CCAAAAT ATG 
TTAAAGGAACC 

ATATGGCATTCCAGCACTTTTTACAAAGGGACAATGGGTATACAGAATTAGAGCTAAAGCACGAAAATA 
T G AAT C AAAT T 

T TGAT GGT CCAT AT CAAGT T CAAGAAGTAT T AGGT AAAG GT GCT T ATAAAT TG AGAGAC AT C AC T GGAA 
GAGAAAAAGGA 

ATC T ACAATCAGGAT CAGT T GAAGT TAGCATAT T CAGCAGACAACGAT CCAAT ACAGGT T T TT AG T T C T 
TTTAATAAAGA 

AT ATGATCGAGTACAACAAAAATTGTT AGACAAAAT TCAATCAGAAAGAGATCATCAATTAAAT TGTTT 
GTCAGTCCAAC 

AT T TACAC AG ACAAAGAAGG T T ACT CG ATAT AT CC AGC T GT CT T GAG C AAAT T C T GC AAT AAT T T C GC T 
AATCATTGGAG 

GAAAGG GT AGAT GACGATCCT GCAT AT T TCG TC AT AAT T CACACAT T CTT AAAAT TAT T CAC ACAT CC T 
TGAAATGTGTT 

AAT AT T CCCAACAT TAT CAAT T ATATGT GT TCAGAAT T GGT TGC AAAG TT ATC AAC T C AAT TC ACG C T A 
TATAAACCTTA 

CAATTTCTCTACATTTTTATATTTTTTTATATTGGCTTTTCTTTTAGAATCAATCAATACTTTTTTTAT 
CATT TAGATAC 

AT C T T T CATCTAT T AAT AGAT T AT CT TTCTAT ATAT CAAAACACGACACAGTCACGT GC CAAAAAGGAT 
ATAAGAAGGAA 

CTTCACTGAAATGCAATCACTTCGCATTATTCAAGATCTTTTTCTATTGTGGCTGGTTTTTGGTGATTG 
CTATGTTTGGT 

T T T T T T T T TCTG GAAC ACAAGCAACCAAAT T TT T CAACT G T TACGT CAC ACAT T T AC T G T C ACAC T CAC 
T T AC T GGCACA 

CAAAGAACAAAGCAATCATCCGGCGTAAACTTTTGGTCTTTGAGATGCAAAAGTTGCAAAGCAATTGGC 
ACTTCTACTAA 

GATGGTTCCAGTAAAAATTGTGTTTTATAGTACATCAATAATCAAACAATACTTAATGATGTAACAATA 
CCTTAAAAAGC 

CCCCACTATATTTCTTTTTTTTTTAAGTTTGCTATATAATTTATTATGTGTTATTATTATTGACTTAAT 
TGTTAGCATTT 

TAT T GCT T G AGAT CG T TTGCTTGT C ACT C C ACCCT GAAGAAAAT T T GAAT AAT T GCT AT T AAT T TAT T T 
ATTTCTTGGAC 

ACAC CCC GT ATT GT CGTATGGG TAT AAAT T CCG T T T CAT T T C T C C T CCCT AT T T CAT ATT T CAT AAC T T 
C TT AAAT CAAT 

ATTCAAACCAACTCCAAATTATAAACTATCAAACAAAGAAACAAAAAAACACACAACACA 



FIGURE 33 

>retrotransposon_15 POL protein 916aa 

MSFPRTHSPRPSGSREQEDLTSMIKAFRDSMEAKLDLHSQKLTALVANIPRTDEGFEDLSQRITVLECNH 
Q 

KAFLPKQEKEIGSLLHRQREEEGDIKDFKTWGEEKEELHQVEDFVLKDQEELRNVEKKVLKEEEELQK 
V 

EESMEKEKQELYQVEDFILQRDETVKKLGESNQSQQEPYTPATSGSDQRFRSQQPNIGNTLAQDLALIP 
K 

LDSEICKIAVKYPKLFETKLRPPPPRDFQYKIQLTDHTQIYSKPYKCNQEEQALIKDFINEKLEAGVLV 
P 



APIDAWLHPIFPIRKTNANQSSTKIAVDLRRLNKVTVRMYTYPTDTKDLLSSLTDSHYFSALDLKNAFY 
Q 

VSIHKDSIKYFGISTSEGNYCFTTLPFGAINSPTIFTNFVRQILEGIPCIFIYMDDILIHTKTLHDHMS 
L 

LRRIMEKLNEHQFQMNYNKMQLLTTKINFLGYSIQANKISPDISKIQAIQNWELPTTTTQIRAFVNFSN 
H 

FRIFIPEIAKFTNPLNELLKNNNGKNIKIEHTQASIDGYKALKAAIIGLPTLQLYNPKLPTIIFTDASH 
M 

WGGYLCQPTFRNDKEVLVPIAFSSHKLTETQSRYAAMEKELLAIIVILEKFRYHCSNTVEIYTDYQSL 
A 

SYLDKKTTPPPRIARFLDLIGSFSPKVYYLSGKKNFVADIITRYQTQNIKELVDEDKILGQTFTVKRNL 
K 

QQLLPRLE^IELENLNESQVHKIQTSLEQQQQHDLEDNDEELPLQSFKLMNDELFVIINWQLLKYLPRS 
E 

YND I CQT I HDKHHPSTRVTDYLCTLAYWHPDHLLI ATNITRKCH YCQLNT S I REAI RP YRPLEPLKAFS 
R 

WGMD YSGP YFNT VQHRYI LVAVE YVTGLT I AVPTLHKDADNAI SLLQS IIS IMS APTEL VT DQGKKI FI 
T 

SFGYPM 



FIGURE 34 



>retrotransposon_16 3470bp POL protein: 309-2552 

GTATATTTCAAGACGTTATTTCTTGTGACCCTTGGATGACTACTCAAAATACTTGACAGTTCAACCCAC 
TAT GCAACAAA 

TCTGATGCTACTGCCGAAATTATCGAATTCATCAATCATTGGGAAAAGTTCTTTCTGGGAAATGGCAAT 
T AC CAT AC GAA 

AATTCTCCGGTCGGATAATGGAGGGGAATTCTTAAACAAAACATTGACTACCTATCTTGATTCAAAATA 
TAT T ACTCACC 

AAACC TCC AATGCCT AT G AACAT CAT GAGAAT GGC GCT GCAGAACGAGC T AT TAG AT C GGT T AAAG ACA 
TGGCTCGAGTA 

ATATTGCTTCAATCCAAATTACCAGTGCCGTTTTGGTCCCTAGCAACCCGATGTGCTGCGTTTGTTATG 
AATCGTCTTCC 

T CAT AAAAC AAT AAATGGTAAGAT T CC T TAT G AAGT AT GGACTAAAC AAC TT GT CAAT C T CAAAAT GAT 
GAAACCGTTTG 

GCTCTCAAGTATATGTGAAAATTCCTATTGGAGTCAAAAGTTTTTCTGCACAAGCACTTTCTGGAATCA 
T G GT GGGATAT 

GCCACTAATAAGAAAGGCTACCTTGTATATGATCCCACACAAAATCGAATATTCACATCCTCACAAATA 
AT AT G TCAT C C 

G AGCAT T TAT C CAGCAGC CAACCT TACG T T T AACG AAC C CTT AAT T AT CT C AT CGAAAGT CACGGC T G C 
TCATCTTCACC 

CCCTTACCATTTCCAATTTAGTTATTCCACCTACCAATGCTGTATCTGAGACACCTCTTGCAAATTGTG 
TGCTCTCCTCA 

AAT TC GT CAGTAT G T CCCAAAG T T T GC CAATTAC AAAC T G TCT T GG AAC ATGG GGAGGATAAAATAT AT 
GC ACT GAT TAT 

ACCAAT AT CGAT CGGCAAT ATGAAAC GC ACAAGAAC AAAT GAAAAC AAAAT AT GCC AGC TAGAT G AAT C 
GAACAATACCA 

CCAT ACCAGAT AGTGT AAT TT T AT C GGCT AAC AAT GT GT TATT AAACT T AGAAT C GAG AT C T TC C AT TC 
CCAAAAGTTAT v 

AAGGAAGCTATAACATCTAATGAAAAATCCAAATGGGCTGATGCTATGGATAGCGAGTTTAATTCATTA 
CAAT CCAACAA 

C ACG T GGTC AC T T GAAC CACT AC CGGAGGGACG CAAAGC TAT T GG T G TC AAAT GG GT TTATAC AAT C AA 
GGACACCGGTC 

GCTACAAGGCTCGCCTTGTGGCACTTGGTTATCGACAACAGGCTGGTGTGGACTTTCTCGAAACGTATG 
CTCCCGTGATT 

CGTGGAGAATCAATCAAACTAATCTTTGCACTCGCGTCAAAATCCAAACTAAAGATTCATTCCATAGAT 
GTTACCACAGC 

TTTCCTCAACGGGGAAATACTGGAACTCATATTTGTGAAACAACCTCCGGGATATGAAGATAAGAAGCG 
T C CT AAT CAT G 

TTTGTAAGCTCAATCGCAGCTTATATGGGCTTAAGCAGCTGCCACTAATGTGGAACATTAAATTAAATG 
ATGTACTTATA 

AAGGAAGGTTTCCGTCGACTTGGTGGTGACTTAGGGATATACATTAGTAAGGACAAAAGAACAATAATG 
GGAGTTTATGT 

T G AC GAC AT T C TCATT T G T GGACCTT CT G ACAGT GAAAT T GAAC AAG T AAAG AACAACG T GAG AAAAT A 
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CTTCTCAATAA 

CTGATAATGGATTATGCCGAAAATTCCTTGGAATTAACGTCTATCAACAAGCAAATGAAATAAGATTAA 
GTTTGAATGAT 

TATATAAGGAGAATGATTGAGGAGT TAAAATT ATCT GTCT CAGAAACAAACCCAGT ATCTATACCAT CT 
GAT G T C AAT T A 

TGAAATATT TAAAGTTAACGAAAAT GATGATGAGAAACCATGTGATCAAACCAAATACCGAAGTTT GAT 
AGGCAAGCTCT 

TGTTTGCCAGTAATACTATAAGGTTTGACATCGCCTATTCTGTCAACTCCCTATCCAGGTTTATCAACG 
ATCCCAAAGAA 

AAACATTGGATTGCAGCTGTCAAGGTGGTAAAATATCTCAGTGGTACTCAACGGTATGGTATTTGTTAT 
AACGGTAACGG 

T GAC T T GAAT AT T T ACGC TGATAGT GATTGG GCT T C CAC T CC AT CT G ATCGAAAG T CT AT T AC GG GG TA 
CATTGTTACCT 

ATGCTGGAGCGCCGATAAGTTGGCGTTCCAAGAAGCAGAACGTGATAGCCTTGAGTACGACAGAAGCGG 
AGTTTATGGCT 

CTCACAGAGTCCATAAAGGAAGCCCTTTGGCTAATATACATTTTTCGAGATATTAATGTGATATTGAAA 
TTACCAATTGT 

GAT AT AT GAAGAC AACC TAC T G TG T CAG AAATT AC T T GAAAAT C C T C GAT T CCAT AAT AGGAC AAAAC A 
CATTGACTTGA 

AAT AT AAAT T T ACC AAAGACC AT AT AG AAG CT GG T ACAAT C AAAGT GGAAT CAACTAAT T C AGCAGATA 
ACTTAGCCGAC 

ATGCTAACTAAACCTTTACCAAAAATTAAATTTAAACATTTAAGATGGCTAGCAGGATTAAGACCTTTA 
GATTGATTAGA 

TAATGATAAAATGAAATAAAGATTAATTTGGAGATGCAGGTTGATGGGGAGGATGTTGGAAAAATGAAA 
TAT GAT C AAT C 

CTGCATCTAGAACCTGTGGCAGAATGAAACCTACGAGATTATGAATGACT TGTGAATACAAGTTGAAT G 
T T AC AGAAT GT 

TACCAAGAAGGTTACACTTGAATATATGAATGACTAGAAAGTGAATTGAATGTTACAGAACCTGAATAA 
CAATGTTACAC 

GAATGTGT GAATGATATGAGTTTATCTATAGTAATGTGACATATACACAAAGGTGTGAAT GACCGAGAA 
AAC AG AT GT T A 

CATTACGGGCACTGGAGAGTGCAAGTCTAAAGAATCTTGGAGTAGAAATAAGTAATATAAAAAGGACCA 
AAGATTCTTTA 

GAGAAAAG T AAAT GAAACT AT ATT AG AT T T TAT AT AACT AACTAAC AAAT AAATAAAAAAT ATAAT AT G 
TCTACAAT GC C 

ACCAACT TCCAAACGT ACT AGAAAGAGAAC T AG AACCGATGAT AAT G CT GAACC AAC TAT T CAAGAT CC 
TTCACCGCCAC 

T T GC TAAT G TTGAAC C C ACAAT T CAAGAGACT CCACCGC T GG TT G AAG T T AGT GAT G AGAC T AAT T C AA 
CT G AAAT C AAT 

GAGACAAATAGTAATACTCATGAAGAAACAAATGTATTAACTAATGTGCACTCCTCTCCAATCGAGACA 
GTTACTGAGAG 

GAAC T T C AATT T T C AAC AATAATAAT AT T GG T TGGAT T T AC AC GT ACG T TG T T G T T ACAAAGACGT GAG 
CAGAGT GAGAG 

AGATCAACCTTCATATTCAATCTCATCTCAATCAACGCTCAATTTTTTTTTCTTCTCCCTCTCTTTGTT 
GT T T AAC T AAG 

TTTGTTCCCTTCCATCCAAGCAAGTTAGAA 



FIGURE 35 

>retrotransposon 16 POL protein 748aa 

MARVILLQSKLPVPFWSLATRCAAFVMNRLPHKTXNGKI PYEVWTKQLVNLKMMKPFGSQVYVKIPIGV 

SFSAQALSGIMVGYATNKKGYLVYDPTQNRIFTSSQIICHPSIYPAANLTFNEPLIISSKVTAAHLHPL 
T 

ISNLVIPPTNAVSETPLANCVLSSNSSVCPKVCQLQTVLEHGEDKIYASIIPISIGNMKRTRTNENKIC 
Q 

LDESNNTTIPDSVILSANNVLLNLESRSSIPKSYKEAITSNEKSKWADAMDSEFNSLQSNNTWSLEPLP 
E 

GRKAI GVKWVYT T KDTGRYKARLVALG YRQQAGVDFLET YAP VI RGE S I KLI FALAS KS KLKI H S-I DVT 
T 

AFXiNGEISELIFVKQPPGYEDKKRPNHVCKLNRSLYGLKQSPLMWNIKLNDVLIKEGFRRLGGDLGIYI 
S 

KDKRTIMGVYVDDILICGPSDSEIEQVKNKVRKYFSITDNGLCRKFLGINVYQQANEIRLSLNDYIRRM 




EELKLSVSETNPVSIPSDVNYEIFKVNENDDEKPCDQTKYRSLIGKLLFASNTIRFDIAYSVNSLSRFI 
N 

DPKEKHWIAAVKVVKYLSGTQRYGICYNGNGDLNIYADSDWASTPSDRKSITGYIVTYAGAPISWRSKK 
Q 

NVIALSTTEAEFMALTESIKEALWLIYIFRDIWVILKLPIVIYEDNLSCQKLLENPRFHNRTKHIDLKY 
K 

FTKDHIEAGTIKVESTNSADNLADMLTKPLPKIKFKHLRWLAGLRPLD 

FIGURE 36 

>retrotransposon_17 1550bp LTR zeta: 887-1394 

GTGTTGTGTTGGGTTTGAATTTCTGTATAACTCAATTTGGAGATTTTTTTTTTTTTTTTTTTGAAATTT 
TTATTAGTCGT 

GTACATTGTTACAATTGTTTCTCGTTCCCCTTTTTTTTTCCTTTCTTTGTTTTGTTTTGTTTACCTTGT 
GATAATTTTAT 

ACGTGTTGAGAGGGCTCTCGTCGTGCCCGTGTCCGTTTCCGTTTCCGTGTCCTGTTGGGTCCCCTCCGC 
CCATGCCGCAC 

CGCACCGTACGGTAATGATATCTGATTGTTGTTGGAGCGTTCTTCGCTAACAGGTTCTTTCTTTTTGTT 
CAGGGGTTTCG 

AAAGATAATGTAGAAACACCAGGGCTTATAACTGAGAGTTAGAGTAGTGGAGATTAGTAGTAGTAGTAC 
AAT C CTATAGC 

CCAAACATTATTGGAGAGATCTTACCAAATAGCAATCATCATGATGTATTTACTACTACATAAATNATT 
T AAG AC G AC AT 

TTACCAGCAATAAACAACATGACCAACTAATTAACAAACATTTGAAAAACATAAAGTAATTAGAAAGTT 
TAAAAAGTGTA 

CAACCAG TGT GGAAAAAGAAT GGAAT T GGAATT GAACAAAGTTAT T AATT AC T GAAAAAGGAAAT T T AA 
TTTCTTGAAAG 

GCAAATCTTTGTTTGTTTTTTTTTTTGGGTCTTTTCTTTCATTTAATAAGCGTGGGGTATTAATAGATA 
ATGATATTGTT 

GTTG TTAT TGT GATAT T GTTG TGAAAT T T GACATAT GATAAGATAAGT T TCTTTCTTTTCTTT CAACTA 
GTATAATTGAA 

CTAAAGACCACCACCACCACCACCACATAGTTAGCAACCTGATATGCTGTTCATGTAACAGTAAATTAT 
CTTGGTACTAT 

ACCACTTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTG 
CACAGGTTAAC 

TACCTTAATATAGTTATTGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTG 
TTAGGTTGAGT 

TAATTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGA 
TAAAACAGAGA 

GT GTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTT TAGAGACTAACCACTAGAA 
AT C TAT TGAT G 

G T T T CAT AT AT AGAGAT T AACGATTATATT TATAAT ATAAG T T GGTAG T T GCTAGT ATATNT GAAAG CA 
CTACAGTATAG 

TATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGC 
C AG T AC AAT AA 

TGGCAGATCAAACTCAAGGAGCTAACCCACAACAATGATAATTCATCTTTTTTGT CAAGACGATAGT TA 
AT G T TACAAGC 

ACT T TNATT GGG C T CGAAAT AGTGG T AAATAGGGT CCAT AGGAT AT GACC T G T TACAAGT T TAT T TCGA 
TGATCNAGCCG 

GCC T C T GTGAT T ACGG CAATT AT T T T TACC 



FIGURE 37 



>retrotransposon_18 2132bp LTR zeta: 1418-1926 

TT TTTAAAAGAATTAAT TAAAT AT G AT GG AT G AT AGAAATTAAAGGAAAAAGAAGAAGAACAAAAC AAA 
AGTTTAATTGA 

AAAAAAAGGGAGAAATGAATATTGAATTATTCAGCTTTTATATTGCTGATAGATGTTGAAAAAAAAACG 
GAAGAAT GGGG 

ATAGCAAAACTGTGGGTGAGATTAACTCATCTATGGCGCTAAAAGTCTTTTTTTTTTCTCTTTTATTAG 
G GGGC ACATAA 

AT TAT T C T T T TCAT TGAT AATCCCGAG TCCGTTTTTTGTT CAT TAT TC GGAAT ATAT TACC G TAT T G G G 
AACGATAATTA 

TTATTAGTTCTCCCCGATGGTTCGATTTTGCTGGTGCAAAAATATAAATCCGATATAACTTTATTGGTG 



CTTTGATAAAT 

CC G T T T TAT AAGT T GGTAG ACAT AT AC AG GATG AT AAT AAT T T AACGG AT T T AT AAGT T G GAAT CATT T 
GGATGAATCCG 

C T T GGGG AGGCGT T T T CCAAT TT T AGAAGT T T AACTAT C AAT T T T ATG T GAC AT CCG AG TAT AC ACAT T 
TTGTGAATTTG 

ATCTTGTAAACTCACTTGGTGTACCATGGCATTTATAACAACACTTTCTAGAATCGGCTGAGTTACATG 
CATTTCCTCTA 

T T T G T AGAT T AAT GGAAAT T CAT GAAAT CGT T C ACATT T T T T T CT AT AAT GAGT AT CG T TCGGTTTC CA 
TAAGTAGGGGA 

C T AAAAAATAAT T GAT AT CT C T AAT C AG T GACAGC TCTAGTC AAC T T GACC G T AAT GT T T T G ACG ACC A 
TTATATTTCTT 

GTTTGAACTATTGATTTATGAGTGTTGTCGTAACAAAAGATCAATTCCCGTCAAAACGCATTTGGCACT 
TAATCTTTGAT 

TGAACCGATTTTGATCTCAAAACATAGTACCAAGGTCAATTATGTTCGCTAATGAAAGAAAGCTGTGAC 
GAAAAC CT C AA 

ATTCATGAAGAAAGAATTACTGTTGTGGAAAA.TAAAAAAGTCTTTCTTCTGATACTTTACAAGTCCCTC 
AACCACAAATA 

C AAAAAT GAAAGT T ACCCAT CG ATC T T T T TC AT TGG T TAAGAATT AAT ACGAGAAT AT CAAAT TAT C T T 
AGAGAGGGTCT 

CACAGAGCAAC T TTC T GAGGCAC ACGGT C ACCAACAT GAT T T G TT AT AAAAAATT CAAC CAAATT TT G G 
AAAAAATGAAA 

ACAAAAC AAAACAAAAT C T GAAACATCCC GAAAGT CACAAAT GCT T GAT TAC TTAAAAT TACT TAT T T G 
CT T C AAGACGC 

TAT TAT TAT TAT TAT GACAT AAT ACT AC T TGAAT AAC AGT GAACT GT AATT G TAT T AAGAACAAAT CAT 
AACAAAGGAAG 

AT GAT GAC GAT GATG AT GACCCCTT GAAAT AT CCCAGGGCACATGCAT TG T GATGAT T G T T GTAATAT A 
GCTAATGCTAA 

TTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATT 
GTTAATACAGT 

TATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAAC 
CAACTAACTAC 

CG TAT T AAAT T AT T GTAT TAAGATTG AT T CCTATT AAGGAT AAAACAGAGAGT GT G T T AGAAAG AG AAA 
GGGTGGATTAT 

AAATATG T GTAAAAAT C CCC T T T AGAGACTAAT C AC T AGAAAT CT AT T GAT GGTT T C AT AT AT AGAGAT 
T AACG AT TATA 

T T TAT AAT AT AAGT T G GT AG T T GCT AG TAT AT T TGAAAGCACTAC AG TAT AG T AT GT C AGAAT C AGAT C 
ATTTAAACTCT 

ACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAA 
GGAGCTAACCC 

ACAACAAT TACCAT ATTATATGAAGAAGACTATAACAAAACTGTAGATAGTAGGGGAT TGGGTATTTCC 
GGGGG AG TAG A 

AGTAT TG GGGT TAT CT AAGT CCATCT T TAAC C ACC CAACAAT C CAAC AAC AAC CCAACNACG T TT T T CC 
CCAAT TCTCNG 

GAGATNACTTGATTAACTTNAAATTTTTCCNTGGCCAAAAAATTTCCTTTTC 



FIGURE 38 

>retrotransposon_19 1734bp LTR zeta: 7 67-1274 

AATAACCAACCAGCTGCTCATTTTTAGATGTATGTATTTTATAGGAAAATTGAATAACTTGTTATTACT 
ATGGCCTGTTT 

TCTAAAGCCAAGTTGTTTCTTCTTATATTTTTTTTTTCTAAACACCGTTTGTTGAAGATGGCTTTATCC 
GTATACTATTG 

GGC GTCGAT T TT CGC ACAAAAGCTT T T AT CC ACG GAAT AT T T GC GATAAT AT AG T ACAAAAGT G T G T TC 
TAGTCTTGTAA 

AT GTC C AAT ATT T T TAG T AC AACGATGGAAACCCG TAT AGC GCAGAC AC AGT T T GGAT AG AT T T ACG T A 
GGTGATGAGGA 

GTT AAAT TGAAT ATT CTTGTAT AAT TTCAAGAGCTGTGACTACTATTTAAATTTTTTCCACTTCACTTT 
CTTTCTCTTCT 

TTGACATTCAAGTTAGTCTTTCT GTAT TTGAAT AAT ACTACATTTATCATGTCTCACGTCTCAATTGTA 
ACTGGTGCTTC 

TAGAGGT AC G T T T T AATGAAC AAAAT C T ATG AT GT T G AGACT T C C AAT T T G AAC T T TAG TAC TAAC T C A 
AATAAAGGCAT 

TGG T AAGGC T AT C GC CGAAAT T C T T T TAAAAAC T C C ATC T TCAAAAGT T GTGAT T G T T GCTAG AT C T CA 
AGCTCCATTGG 
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AATCTTTCCAAAAGCAACACGGCTCGGACAGAGTAGCATTTGTTGCTGGTGATATTACAGATCCAGC^ 
CGTCTAAGACT 

GCT G T T GAAAC T GC CAT CT C CAAATT T GGT CAAT TAAAT GC T GT CAT G T TG T AAT AT AG CT AAT GC T AA 
TTCTTGATTAG 

T GTG GAAAG CC T AAT AAGGT T AT ATT GT GCAC AGGT T AACTAC C T T AAT AT AGT T AT T G T T AAT AC AG T 
TATTGCTGTTG 

ACT ACT AT T GT TAT T G T T AAATT AAAGT GT T AGGTT GAG T TAAT T GAT T AGT G AAAACC AAC T AAC TAC 
CGTATTAAATT ' 

ATT GT AT T AAGAT T G AT TCCTAT TAAGGATAAAACAGAG AG T G T GT T AGAAAG AGAAAG GGT GGAT TAT 
AAATAT G T GT A 

AAATCCCCTTTAGAGACTAATCACTAGAAATCTATTGATGGTTTCATATATAGAGTTTAACGATTATAT 
T TAT AAT AT AA 

GTTGGTAGTTGCTAGTATATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTA 
CTAATAATACA 

GGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGTTAACCCA 
CAACATTTTGT 

AGTCGTAAACTTGAAATTCAAAGAGAAGGGGGGGAATTAAATTGGGTGCAACGTGTTTGTCAAAAATTT 
GGTGTGAAAAA 

AATTAATTTAACACTCTGCATTGTACCATAGGGAATATAATACCCAGAAATAAGAGAAATTATCACGTG 
AGACTAAAACT 

AAATAT AATAAATTAAT ATCACAATTGAGAAAGACACTGAAACTAACT T CTTGGTGTATTAATTTTCAA 
CACTT GATCAC 

AAGT GC GGG GAT TAAT CATAATTGCAAAGAGT G T GT T AG AAAGAGCGAAGGTGGAT TAT GAAT ATTGGA 
GAATCCTCTTT 

AGAGACTATCCGCTAACAAAATAGATGAACTTGCTCAACAGAAACAACTAATCGACTAACTGACTAAAA 
TTAATATACTA 

AGTATAGATTAAGTTATCACGTTAATATTCTATACTATCCATCTCCATCACTTT 



FIGURE 39 

>retrotransposon_20 5734bp LTR zeta: 334 4-3851 

GAGAT TGTAG T G AAGAAT T CAGCTCAT TAT TACT GT T T T GT C GT T GCT GGAAGG AG GAGGGAT AAT T CA 
ATGCGCCACAA 

CAGT G T T ACT ATGC AT GT GG T TC T GAC T GACT GAT AT TGT T TAAAAAT T AACCAGC T CT CAAAT AAC AA 
AAGT T TAAAT T 

TTCAAGGTTTGTAAACATGGCAGCTAGTAGTAGGATGGTTCATAATATTAATTAATTATTAGTAATAAT 
GGCTAAGTTTT 

TGAAGCATTGTTTTAAATTTTCAAATTGAAATTCAATTTCATTACAAATGGATTACTAACGGAATTCCT 
AAGCTCAACTG 

AATACCGTGATTGAAACATTTGAATTTGTATCTTTTAGATTAGCTATTTTTACTTTTTTTGTCATTGTA 
GTTGGTTATGA 

TAATTACAAGAAACTAAAGTTTAATATTTTAATATTCATTTTCTTTTTTGGCCAACTTGCAAATAACAC 
ACAAACCCAAA 

ATT AAAT AAT TAG AT T T AATG CAT GCATAAT TACAC AG AAT G T T T AGCCT TAAC AAG T AT T CT AGAAAC 
AAGAAAGAAAA 

AATGTCGTCTTGGCGTTTATCTTAATTGTATTCTGTAAACTGGGTTAATTCTTATTTCCAACTTTTCAT 
TTTTTTGGATC 

T TGT ATGGAAT AAAAAT T AAATAT GG TAT GT T T T AG GG T TGTATT AACAATAC T TAC AAT TAT C AATCA 
TACAGCTTTAC v 

TAT T T T TAT T TAT CAG CAAAT AGGGGAATTCAAGTT GCAT G T GT TAT T CAGT GGCAGT GAAT C ATAAAA 
CAGCCAACTTG 

CAGCTTATTTCACTCCAGGAGCAATCATCACGGAATTCCGTTTCCCATCTCATTTTCATACTCTGTGGA 
T T AT G TAT AG A 

GGCTATTTACAAT ATCACCAAGCAGTAAAACAT TCTCTCCTCAAAATAACAAT AAGAT TAGTCAAGATG 
AACGACTTGAA 

T CTAT T CAT AT GCAT TACAC AT T TAGT T T CT AT TAC AAAT AG T G AT GC AAT GGT GC AAG AT T ACGT CT T 
GTCTGCACTAA 

C TAT T T G TAAC GAT GAT TATGTGAT C AAG AAT T GG AAT T C T TAT TAT AT T CAG T CGT GAG T GT AAGC T A 
TTTCGTTAGGG 

T T AT CT T AAC T C GAAGT T AAAGT T C CAAAACT AT TCCAT T T GGAGT T T C T GTT G TT G AG AAAT AC AAAA 
TACTCTTCTTG 

G TG GGGAGGAAAT CC AT TAAT GAT T AT AAAAT GAAAC T C T T GG T AAC C TAAT T GAAACAC C AC AT T CAG 
TACATTTTCAA 

C CG TC AC TAT TAT TAT T G TGGCAAAT GGAT TAAAC AAT AGACC TAAC T T AATC TAAT G G AAAT T T T AAA 



TCCATGAAAGG 

GGT GAAAAT T TGAAAT C AAAAT AAC TAT CTGAAC T GAAATAC CCCATGGAT C TGAT ATC T TAT AC AAT C 
TAT CAACT AAA 

C AGGG AAGAG T AC CT GGAATT C CAAATGACAAT TC CTATT AT AAT TAT T TAAACAGACT AT GCC GT AT T 
GTTTGTGACAT 

TCATTGTTTTCCACAACTCTAATGTCAAATTTTTGTTATTGTCATGTAATCCCGGTGTTTCTTTTTTCT 
TTTCGGTGTTG 

CGTTCCATGATATTTTGTTATCTCTTGTTTAGATTGAGATAAAGAATTGGTTAGCAGTGTAGCCATTTA 
TGAGTGGTTTG 

TAAAAACAAGAATTACAAGGTTTGAATGAATTCCAGGCAGGCAGTATTATAAAACCTCGAAATAACTAA 
T C AAAC CAT C A 

GAAAAGAAAGCTTACTATGATGTACTGCTTAATCTCATATCTATCTTACAAACTTAATTCACTGATTGT 
GGCTTGTCCGT 

GAATAATTCGGAAACCTTGTCTTTTTCGGTCCAGTAGGGGGTGCCATAGTCTTGGGTGGTGACAAAAAA 
AAAAAAAATTA 

TAGTTGGGGTGGTGGGGTGTACGTCTGAGTAAGTCAGGGGAATGAACTCAAGACAAAAATAGAAGTTCT 
AAACATGGTAC 

GT TCT GCTAAGT AAT AT CAT CGAT C TAT CTATT T T GCT C T AAAT T T TC ATAAGC AAAT CCAGAAC TT C C 
TCGTCAGTTTC 

AATTTCAAGCATACGAAGGGATAGTGATTAAATTATATTTTGAACCTTCTATTACTGATTAAGTGTTCC 
TATTAGTCTAC 

GGATTAGACGGTTAGAATGGGATTTNCAAAAGCACAAAGGTCAAGACTTATAGGAAATTCATAGAAAAA 
AC AC TCT G AAG 

TACTCGATGGTTGGATATATAATAGTTTTGCTAATTTAAACTCTTGCTGTTCGGCTAAGCTATTGTACC 
CAAATGCGGTA 

CTCCGATAGTCTTATAAATAATACTTGGCAAAAGTTCAATAAATATATGTCAATGGTATTGCTTTCCAA 
T T AC CAT T GAC 

GAGGTTGTAAATTAATTCATACTTAGGTGACATCGATTAATTTAACAAATATGTCTGTTTCAACGCTTA 
CATCATCAGTC 

T TG CAGGAAAAAT GTTAT TGCCACGACACCTCAAAT T AG CC CAACC CC T T CG T C TAC C AAAACAAT G T C 
AAAAACCCACT 

T AAAAGAAGT CGGACAAACCT GAACCCGGTATTT T ATAAAG TAG T T T T GT GAATAAT AT CAGT ACAACG 
ATTACACTTTC 

CGTCTCAAGACTGGAAGTTGCAAAGCCATGACAATTGCTCAACCAAATGTGAATTTTTAGGTTCCATAG 
TCT TGAT CGGG 

TAATG T AAACACTT TAAC T T T T AGTAAATGATACCACCAAGAAGAAAGCACT AT T T TAAGC T T TAT T T A 
ACACTATACAT 

T GGAAAATAAAAAAG T GGC TATGAGAATTAAACAAGATGACC GAGT AAT T AAAATAGT GC T GT C GGTG T 
TAAGCAATACC 

GCTAGGGTTCAATCAATTAAGTGCTGCTTTTTTTTGTCGTTGTATTTCCATTCCTCCACTCCTTTCTTT 
ACTCTTGCAAT 

C T AACATAT T T T T T TT AAAAAGAAAACAT ATTGAT AC TTACAT G TGG T AAC TAT T G T C T GAT T C ATC AA 
TTCCGCTCTTC 

AATC T CGG T G T T CGGATAAT T TCGAT GAAAT TAT AAT T AC CT GCC GC AAT T C T AGAAAT TCCTTTTTTT 
TCTTTTCTTTT 

TC TCGGAGT T GG T TCCAAT ACAAAGAT TG AATT GAAT TAGGT GAGAAG AAG AAGAG T CTTAACACCAG A 
TGTATTACAGC 

TTTAAACTTTGTTTCTAATTTGACCACAAAAAGTTGTCTGGACGCCTCAGTTTGAAATTAGTTTTGGGA 
GATTTCTGTTT 

TCTCATTGGCCTTACTCTATGGAAGTTTTTATACAAGAGCTTCCTTCTAAAATTAACTCTTTGTGTTGT 
AATATAGCTAA 

T GC T AAT T CT TGAT TAG T G T GGAAAG CCTAAT AAG GTTAT AT T GT G C ACAGGT T AAC TACCT T AAT AT A 
GTTATTGTTAA 

T ACAG T TAT T GC T G T T GAC T ACTAT T G T TAT T GTT AAAT T AAAG TG T T AG GT T G AGT T AAT T GAAT AG T 
GAAAACCAACT 

AACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTGTGTTAGAAA 
GAGAAAGGGTG 

GAT TAT AAAT AT G T GT AAAAT CCCC T T TAGAGACT AACCACT AG AAAT C TAT T GAT G G T T T CAT AT AT A 
GAGAT TAACG A 

T TAT AT T TATAATAT AAG T T GG T AG T T GCT AG TAT AT T T G AAAGCAC TAC AG T TT AG TAT G T C AGAAT C 
AG AT CAT T T AA 

AC T CT AC T AAT AAT AC AG GAAAC AC T T T CAT TAG T CTAG AT CAAG CC AG T AC AAT AAT G G C AGAT CAAA 
CTCAAGGAGCT 

AACC CACAAC ACAT TCTTCTTG T AAAAT TAAT TC T AT TAT AAT T CAGGT C T TAG T C G AC GC AAAAT AC C 
ATGTTGCAATT 

G T CC G TAAACAAT T ATACAAC AAT T TAACCAAT GCAACATC AAT TGAAAT CAAG AATT CAAC ACT T G AA 




CATTTTTCTTG 

TTTTCAGATCTCGTCAAAACACCAGTCAATAAAGCTTGGAAAGTTTTAGCACAACCATCAAAGTAGAAA 
GCCTAACTTAT 

AGG T TC GAAT T AC GT GAAT T T T GG T T TCAC T AAT CACG CCCC AAAAAAAT TC AN AAAAG C T T AGT AT G T 
AACATTTATTG 

C AAAT T T TT T AT T GT T CG T C AT AAATGAT AATT AGT AAATGAG G T T AC AG AAT AG T TAT G TT T T AC T T C 
ATAAC C AAT T C 

T AC TAT t TT T T T T T GT AT T ATAACCT C GGAT AAC AC AAAC AAAAAAAAAGT AC T AC TACCAAT T AAT G T 
T T AGTAGAT TC 

TACACAAACTTGATAATGCGGGAGTTATTTTTTTTTGAAGCCACTTTATTTTCAGCCGACTTATCTAGC 
TACGAGACAGA 

ACAATACTTAGCACTAATTCTTAAAATTCCATACTATTTCTATCATTCAAAATGCATTTTAACAATCAA 
TTGTCAAATGT 

GAATGCAACAAAGTCCTGAATTTATAAAAAAAAGTAGATCATTGATGCAAAAAGTGAATTCTTTGGAAA 
GCTTTACTTTG 

AACC GAAAGGAGAAGG CAAG T C GT GC AAC AAGT TAT TAT T T C G T GT ACAGT AT CCAAT TTTGGTTTTTC 
GACACTAGGTC 

TAGACTCCAGAAACAAAGT CCTAATAAGAAAGGTGTTCAAAAACAATTTAAT T TTAGT AAAAAAACACA 
ACC T GCAT T T C 

GCAATTTATGACCAAATTGAGTTAGCTAATTATAGGGCATCAACAATAATATCCAGCCTCACACAAATC 
AGAAAC AGT C A 

TATAACAACTCGAATGCAAATATCAAGACTATGTTATGATAAGAGTAGTTGGGCCAATAAGATAAAACA 
GAAAAAGAAAA 

TTTTATATTCTTTAAATCTTTGGGTGACAGATCAGCTCCAATTCTCTTGAAATTGGCACAAATACTTCG 
TCTTTTTTGAT 

TCATCAGTATATCACGTGTAGAATTGATGCTGATATTCAAAAAT TACCCCTAAAGT TGCT TATCAACGC 
AACTTAAGATT 

TCATACAAGTCGATAACGAATCTGAATTTCAGCTTGCTCTTAGATTAAACAAAATGGTAGATTCAATCA 
AT T AGAT AACG 

CCAAATAACATTTGATGTTTTGCGGCAATATTTGGATGGTGTCAACTAGGAGAAAATTGATTCCCCGCC 
AT ATCT CATAA 

GCC T CTAGCT G TCCACT T T T CTAAATAAT T GAT ATGGAT C ACC ACAT TGG GG T CT AAAT GAAAC AACGT 
AACCCGAAAAC 

GT GT CAAAT T CGGAAT T CGT AT GT ATAAT T CAAACAAT ACAAGAAATAT GGAGAAAGC AGAT AC ACACA 
T AC ACAC T C AA 

AGAGCT T GGTAGAAT AACAATAACT T GAT ATAAT AC GT ACT AT T CAT ACAC AAT T AC TT AAT T GAT T T G 
C AAT CAT T CC T 

AAAAAAAT T C T CT T T TAT TTTTTTTT T AAT TGGT AAT AT CGG T GGT AT ACAATG AT T T ACC T AGT TAAA 
C AAT T G AAAAC 

AAGAAAGT AT AAAAT T T CT T CAT T T ATT T TGCT T AC C C T C T AC C TT GGT AAT T ACAC CGAT GT GAGT TT 
G G AAAT C T GAT 

AATCCCAGAAATTGGATCTAATTGGNTCATATTT AGAT TTCAACAAAT CATAAACAGTTCT AGACTCCA 
TGTATTTCTTT 

TGGTGTGTGTATATTTTTGCCAATGTCTCCAAAGCAAATGGAACTCGTCACTTG 



FIGURE 40 



>retrotransposon_21 1875bp LTR zeta: 812-1319 

CCTCCGGCCGCTAATTACAAGGCTGCTTTATATTGTTATACCTTGGGGTAAATGCCCTCTGGCATTGAG 
C TAT T T CCAAT 

TCCCACTTCGGTATTTTTTTTTACAGCCTCGTTAGACGAGTTCTTGATATTACTAAATTAGTTGTTTAC 
TGAGTGGCCTG 

ATGGTTCCTCGTCACTCTAGTTTTTGGTCTATATAAGGGTCAGAAATTTCCCTTCTCCTTAGGTCCATC 
AAG T CAAG AT A 

T AC AT T AGTT GGT AGCAT CG T AT GGAAT T TTCG T AT GAACGGC AT AC CAAGT AT T AAT T T CCGATC GAA 
ATTTTTTAGGA 

C GT C T T GAT AAT CAGGAC AAAC AT CAT GAAAGGT C T AT ACGAC G AAAG T T T AC T T T AC AC AAG G GG AG A 
CCATATGTCTT 

CTTTAT T AAC AACT AGT T AT AT AGCG AACAAAT AAGT T T AT AC AGAAAT AT AT G T AC AC AAAC AAAG T T 
ATTGTTTATTA 

AT TAT T TAAT T AGC T C GGAAGAAT AAC T C T GTGAT AC T GCAT AC AT T C AAAC AAAAT CAATCT AG T T T C 
CAACATCTTTT 

TCACTTGGTAATGTAATTATTCTTGTTCTGGCACCGACAATGGGTATTGTTTTGTAGCTGGAGGACTAA 
TAT GGGGT ACC 




ACCTCAATTTTTGGATCCCAGCTCCCACGCAGGGGTGGCTTCTGATCTAACTCACTTTCGAAAATATCC 
TGATAGTTTCC 

AATTAATTCAGCAAAATAGCTCTTGTTTGTACCCTTAACCAATGACATGATATCCTTTTTATTATCACC 
GATACCACCTG 

TGTCTTCGTCTTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATA 
TTGTGCACAGG 

T T AAC TACCT T AATATAG TT AT T G T T AAT AC AG T TAT T GCT GT T GACTACT AT T G T TAT T G TT AAAT TA 
AAGTGTTAGGT 

TGAGTTAATTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATT 
AAG G AT AAAAC 

AGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATACGTGTAAAATCCCCTTTAGAGACTAACCAC 
TAGAAATCTAT 

TGATGGTTTCATAGATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGA 
AAGCACTACAG 

TATAGTATGTCAGAATCAGATCATTTAAATTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGAT 
C AAGCCAG T AC 

AATAATGGCAGATCAAACTCAAGGAGCTAACCCACAACACGTCTTCTTCAGTATTAGGGAACAACATAC 
T AAC T T GACC T 

TTTCTAGCTTCAACCAAAAATTCCTCTATATCCATTAATGGAATTTCATCAAACTGAGCAGCCCCAAAA 
AACGTTTTGCT 

T CCAAAGT CTAAAT GAGC AT GGAAT T T CCT TAT G AAAGGTAT AC C AAGT AT T AAT T T C T T ATGGAAGCT 
GTCCACTACAG 

CAAAATTCTCTTGGAATGTAATACCATTAAACTGGAACTTGAGGTTAATTATTTGGTTAAAGTTTCTGT 
TGATTTTTGGT 

CCAATAAAGTACCCAAACTACTAGAGCTCCAACAACATTTTCAGAAAATGGCCAATAATACAATAAGTG 
GGTATATTTTA 

TCAAAAGAGTTTATATTATGGTTACTCGACGGGTATTATTCTCTGTTGGATTAAGGCATCTGGGCGACC 
CAGTGGGACCA 

AAAT T C CAGAG T AGTGG T TT GG T T TAG GAC TT T ACC AAGGNCC AT GAT T AGGGAATATTN T AACCAAAA 
AATTAAAATTA 

CCATTTAATTCNAAAACCTAACCTAAATTCCCTAA 



FIGURE 41 

>retrotransposon_22 1712bp LTR zeta: 672-1179 

T AAC CAT GGAAT T CCTNGAAT T ANT NAT AAT TAAC CAAAT T T T T TAGGGNT T AT T AGGACC T AGGAT T G 
AATTCCATGTT 

TATTTAATAATTAANCCCCAGTTTGGCCAACTATGAAATAGTATAATGGTTAAATGCAAAATAAATATA 
GTATGAACAAT 

ATGATAGT TT TAGTGTGAATTTTGAATAAGAAAAAGAAGGGAT AAGGATATTTTTACTAGGAAACT C AA 
T T ATAAT TACT 

AAT GAT AAAAAC T CCAT C AGC T ACTAT T AT TACT CAAAT T T T AAAT CAT T T GT T TAT CACCT AC ACAAA 
CAGGGATTGTC 

CAAT ATT GATTACTAAAATTAGAACAAATAAGAGAATATAATTGAAGT TAAATAATTCTTTT ACTAAAT 
C TAT T G ACC AA 

G AACTACAT CAAG GGAAAGT GTT GCAT AT AC AT C T AAT GT T TAT T C T T GG T T AGAG TAT T GAT ACAAAA 
TTATATCATCA 

CCAACGAATCACATTAAGGGAAAGTGTTGTGCATATACCTGATGCTTAGTCTTGGTTAAAGTATTTGTG 
T GAAAGG T T AT 

CGTGACCAAAGAT TAT AGTAAGGGAAAGTATTATGAATAAATCCAATGT CTACTTTTACAGAAGTATT G 
ACATGAGAGAT 

TAT AAC TAT C AAGAAT T GCAT TAAGGGAAAG TGT T GT AATAT AGCT AAT GC T AAT T C T T GAT TAGT GT G 
GAAAGCCTAAT 

AAGG T T AT AT T G T G CACAGGT TAAC T AC C TTAATAT AGT TAT T GT T AAT ACAG T TAT T G C T GT T GAC T A 
CTATTGTTATT 

GTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTG 
TATTAAGATTG 

ATT C CT AT TAAG G AT AAAACAGAGAGT GT GT T AGAAAGAGAAAGGG T GG AT TAT AAAT AT G T GT AAAAT 
CCCCTTTAGAG 

ACTAACCACT AGAAAT CTAT TGATGGTT TCATATATAGAGATTAACGATTATAT T TAT AAT AT AAG T T G 
G TAG T T GCTAG 

TAT AT T TG AAAGCAC TACAGT AT AGT AT GT C AGAAT CAGAT CAAT TAAACTCT AC TAATAAT AC AGG AA 
ACACTTTCATT 




AGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGGTAACCCACTACAGGTTATGAGC 
CTCGCCCGCTT 

ATTGAATTTAGATAATATAGGGGCAATGAAAGCTTTTGAAAGTGTTGATTTTCCTGAATCATTAAAACT 
AGAATCCAAGA 

TTAATTTTCAAGTGTGGAGAAATGAAATCCTTAGATATGCACGTGGTATTGGTGCTGAGTTTGAAAACT 
TTGTATTGAAT 

G AAAC T C CAGCTCACC T G TAT GATCT T AGAT T GGGAAAT AT GC T TC AT C AAT TAT TG AT T C GC ACT G TG 
AAAGAAAAAGT 

TAGAAT GCCT AGGC AAG AAC T TGGAAAAT CAGGAAAAGAAC T T T AT C T T GAT C T TAT T AAAT CAT T CGG 
T AC T C AAT AC C 

CAT AC GAT AAATT T G AGATAGT T AAAT ACT AT TGGGATC AG T T AACAAACC CT T TAAT T AAT G T GAAGA 
GACGTTTTGAA 

ATT GAAGAAG TAT GGGT T CAAT AC AT T AATGC T C AAACT GC AACAG AGAGAGAAGT T C T TAAT T CAT T T 
GTTTGGTTACA 

TTTGT CAAAAT CT AT ATT ACCAC AAGAGT ACC 



FIGURE 42 

>retrotransposon_23 154 Obp LTR zeta: 4 67-974 

T GT GGAAT T AAGAT GAC T T T G TGAT T AAAT T GT TGAC T T C T T T AAG CCT T T T AATGT GGAGGAAAAAGA 
AAAATCTATAA 

X T AAAAAAAAAAAAGAT AAAG C AG AT AAT T CTT T GAT CT T TAT AT ACT T GGT C TAT AT GT AG T AGGGGA 
AAGTCGGAGTC 

GGAAT TTGAAAAAAAAAGAGAAAAAAGAACGAAT ATT TAGACTGTAAAATTCAAACCCCTGCTGAT TAG 
TATATAAAAAA 

AATGAGTTCATTTTTCCTTTCTTTTTTTTTTTTTCGCGCGGATAGCAACGGTCATTAAGTTAACGAGAT 
AAAAAAGAAAC 

AACCAGATAATTATGAAAAGTTGTGATGGTGTCACGTGCGAACATGAGAGTCATGAATTTTGACGAAAA 
CGTCAAGCTTC 

AGT T TACAAAAGACC T CT T TATT AAAATC GAAT TGCT T AT AGGGT C GTC GATGAT GAG AAG G T GT AT GT 
TGTAATATAGC 

TAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAAT 
AT AGT TAT TG T 

T AAT ACAGTT AT TGCT GT TGACT ACTAT T GT T AT TGT T AAAT T AAAG T GT T AGGT T GAG T TAAT T GAT T 
AGTGAAAACCA 

ACTAAC T ACCG T AT TAAATT AT T GTAT T AAG AT T GAT T C CT AT T AAGGAT AAAACAGAGAGT GT GT T AG 
AAAGAGAAAGG 

G T GGAT T AT AAAT AT GTGTAAAAT C C C C T T T AG AGAC T AAC CACT AGAAAT CT AT T GAT GGT T T CAT AT 
ATAGAGATTAA 

AGATTATATT CATAATATAAGT TGGTAGTTGCTAGTATATTTGAAAGCACTACAGTATAGTAT GTCAGA 
AT C AGATCAAT 

TAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATAGCAGATC 
AAACTCAAGGA 

GG TAACCC ACAACAT AGAAT AC GT TTT CAAC TAC T TAAGT ATCCACT AAC C T AAAT TTTTTTTT TAAT A 
AAATTTCATTG 

TAT TAG T CTT T CTT ACT G C T T T TAAT C AACTAT AAG TAT AGGT TTCCGTTTTTTTT GCAG T AAAAT T T A 
TCGTTCAGGAG 

AAATAACAAAATGTACACGACTTATTCGCAGCATTTTTTTTTTTGTTTTGGGTTTTTGTATCAAATTGT 
TACAACAACAA 

CAACAAC CT C AAT T CTT AACC AAATCT ACC C CT CC T AT T T T TT T TNC NC ATACACACAAT AC AT CT T AC 
ACTATCTTTTG 

AT AGG CT T T ATN GAAGANGTATT T ANGGNGT GT AAT GAC AATC T GC T T AACNCAT ATATN T AT N T ANNG 
NNNGTNGTCAA 

CAATAGCTTTATCTACTTTTTTTTTTTGGNNACNCCNGNAACTTCAGGNCCACNNNTTTGCCNATTTTG 
GGGCCCCNATT 

NGGAAAACATGGGNATTGGGANNACAGCTTTTTTTAGGNNNAAANGGGTNTTNCCNTTTNTGGTGGGCT 

TGGAAAGNAAC 

AGCNTNTAAANNAATGGGCT 



FIGURE 43 

>retrotransposon_24 2025bp LTR zeta: 787-1294 



3?//*? 



TGGGGAGCAAATGTGAAATTAAAGAGTGTGGTGATATGTAATTTTTTTTCAAAAAAGATTGGATTGACG 
AAGCATTATAT 

ATTCGTCTAAAAACCATTTTTGCTGGTTCCGCAATAAATCTCGGAGATTATTTCTCGATTACCAATTTA 
TGTTGTTTTGT 

G ACAT T T C T T ATAT TT T GT T C TAT T T T ACACGAC TAT T TATT GT TAATAAATAT G TCACC T AAAGAAT A 
TTTCTATTTAG 

TT T TACATATGT T T TT T GACGACAAT CAAC TAT TACAAAT T AACCTAC AT T TT T TAAT T T GAAT AT AT A 
CAATTTATATT 

GAAT TAAC AT T ACC AT T TAG x t TT T GAT AAGAAT AGAT T GCGCTAT T T CAAAC AT T TGTT AAAT TAT T T 
ATTGTGAAACA 

ACTATGTAGAATAAAAGTATGAACAAATTCTACGTTCATCATGTGGGGTGTGCCTTCATATATATCTTT 
GGATGAGAATG 

CCAAGAAAAATGATGGCGTGACAATTCAATACGGCAAAACAAACTAATCCCCTCTAAGATTTTACTAGT 
GTGTTTCCCTA 

TCGTC T GAG G AAAAGG TAACAAAACAT C GT T T AACCAAT T GGT GT TT G TT ACG AT GGT GAC G T T GAGTA 
C T G CAT AT AG T 

T GCAAC GGCAAAT T G CATC CAGCGAGTTAACAGC GAAT GG CAAAG T G AAGC C TCCGACT T G T G T T CAT T 
GACTACTGGGA 

TT GGAC TG G GAATAACGAC T TAACTAAT T AATGT T CTC GT G GACT CGTT TAGC T AGAAC TAACAT T T GT 
TATAATATAGC 

TAAT G C TAAT T C T T GAT TAGTGT GGAAAGCCTAAT AAGG T T ATAT T GC GC ACAGG T TAACT CCC T TAAT 
AT AG T TAT TG T 

TAANNCAGTTATTGTTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGGTTAGTTAATTGATT 
AG TGAAAACCA 

AC TAAC T ACC G TAT T AAAT TAT T G TAT TAAGATT GAT T C CTAT TAAGGATAAAACAGAGAG T GT G T T AG 
AAAGAGAAAGG 

G TGGAT T AT AAATAT GTG TAAAAT CCCC T TT AGAGAC TAACCAC TAGAAAT C T AT T GAT GG T TT C AT AT 
AT AG AG AT T AA 

CGAT TAT ATT TATAATAT AAGTTGGTAGT TGCTAGTATATTTGAAAGCACTACAGTATAGT ATGTCAGA 
ATCAGATTATT 

T AAAC TC T ACTAATAAT AC AGGAAACAC T T T CATT AGT C TAGATCAAGCCAGT ACAATAAT GGCAGAT C 
AAAC T CAAGGA 

GCTAACCCACAACAGCATTGATTATATAATCATCTATGTAGCCAATATACACTACCGTCCAAACTCCCA 
C T AC AC AC T T G 

T AACAGT G T T T TACAAAT C T ATGAAC GAAT AACC GAT T C AAAT G AC ACAAT AAAGAAC AT T T C ACC GAT 
TTGAATTGCTA 

ATCGGT AC T ATAATAT TGAT GGAAGGT TAAGAG T T T AAT GCT AC C CT AG GT T TAC CGGAGAT CAACAG T 
TGCATATACAA 

AAC GT GT TAT C TGT C TAC GAAT G GCTT T C TAT G T GT ATAAAATGT TT CAT CAAT T GAT AAT T AATT AT T 
AAT CT GC T TAC 

TCCATTTCTAT CCTTTT ^ 

CAAAT T GCC GAT AT AATGAAAT GGAAAT TAAGGGAAAAAAAAAAG T TT AT AT CCAAAT T CATGC GAT T A 
ACAGGTTCTTG 

TGATTATAATTGGTAACCCCCTCCCCCCTAAAACTCATATCTGCCAAAAGAGGAGGATATTTGAATATG 
CTAT TAT GAAC 

CCCATTGATTTTGACTACAATTGGATTTGTCGGGTATTGAAACCCAAACATATTATAATTTGCTATGCG 
T T T AAAT CAAC 

CGTTTACTGGTAGATCCTATACTATAAATACAGCCAACAATCCCCAATT GT TCAGATAAAGTAACACTC 
AATATCATTTG 

ATCAATCAATCAAGAGGATTACAAA 



FIGURE 44 

>retrotransposon_25 3583bp 

AAAANNTTCCCCATNGCCTATTCCTAGGNCCCAAAACCAGTTGTCCGAAACTCCATGGATGCCAGAAGT 
GGTGGTCCTCC 

G C CG T T AT GG T TGGAAAAGAAAAAGAAAC T T GACGAATT G AAAG T CAAAG AAGAGC GGCAAGAAAGAAG 
GAAGAAGGGGC 

AAAGAAAAAGGAAGAAGAGGCAA&GAAAAAGGCAGAGGAAGCGAAGAAGTG T TTTATTT TACTTTT C T G 
TCAAATTTGCA 

CTACTTTTAATTTGTGTGCAAATATTCTATTTTACTTGATTTTTATATACTTTTATTTTACAATACTTT 
TTTATAGGACT 

TTTTATATCTTTTCTTTATCAACTGTTCGCTATAGGGTAGGTCTTCCAAGCTAATTTTACCCGACACAA 



fq.HHC ce * 



GAT G AAATATT 

T T C T GT T GAGCACT C GT TG TCGACAG T GAAAAAT T T TC AC TC AAGAAAAT AT T T TAT CAT CAC T T T T T C 
TAGAAGGGAGG 

TT CAAGTGTTGGAGAATAGACAGCGAACACCTGATATTCCCAAGGTCGAATTAGAT TGAAAGAT AAATA 
ATAGTCATATT 

TAT T T T GTAT T TAG T CAAT AAAT TAT C T T T TT AT AT T TAAAT T C T T AG T ATT GT C AT AC CAC G TAG AT T 
GATACGGACAT 

ACTTAGCACATTTAACATATATTAAGCACCGATTACCTGTGACATTCCGGAGTTTACTGTTTCGCGCAC 
GCTGGCAGACG 

AACATCAACTCATCTTTTATACAATATATTCTTACGATTATAACTTTCAATTAAGAAATACAACTTCTT 
ATTAGCATTCT 

CCTACAAGTTCTTAAGTTCCTAGGAATTTCTTCGAAACTATAATTAAAGACGGAAAAGTGTAAAACAAA 
CAGAAAGCAGA 

GGAGGCCAAGAAGAAAGCAGAGGAGGCCGCCCCACAAAAGTTTGACAACTTTGACGACTTTATTGGCTT 
TGACATCAACG 

ACAATACCAACGACGAAGACATGTTGTCCAACATGGACTACGAGGACCTAAAATTGGACGACAAAGTAC 
CTGCCACCACA 

GACAACAACTTGGACATGAACAACATACTTGAAAACGACGAGCTGATACTAGACGGGTTGAACATGACA 
TTGCTCGACAA 

TGGCGACCACGTAAACGAAGAGTTTGATGTAGACAGCTTTTTAAACCAGTTTGGTAATTAGGGGCTCTG 
TTCTACAAGAC 

AT AT ACAGAT AGTGCAGGAATAAGAAAAGAAAT AT T T T ATATAGCTAT AT AT T T C AAGT GT T TAT T CT G 
TTCAACAAGTT 

CT AAC C GTAGATACACC AAATCACC AAGT C AGACAT T AC T GAGCTAGC T T AACGG T CCAAC T ACT T T AA 
ATTGCAATCCG 

TTCTTTACTTGAGTCAGTCGACTCTACAACAACTATCCTGAGGTGATTATTTTTTGGTGGAAATTTTGA 
CCAAATTCTTA 

AGCAAAAATCTAGTTTCTACTGATAAATAAATACACATTGCTCTACTTCTGTACTCCACACTCTGCTAT 
TGCTTGATAGC 

CATCCTTAAATCAACAGAATCCACTAATTCTGCTACTTCCAGAACCATGACTACTCTACAT T TTTAACC 
ATCTCAATTAA 

TTACCATCTTTTTCTCTCATTATTTGGCACTATGGCCGAGTTGGTCTAAGGCGGTAGACTCAAGAATTA 
TTCTTCTCCTG 

CGATCCAGGGGTTTCTACTATCGTAAGATGCAGGAGTTCGAATCTCCTTGGTGTCATTATTTTTTTTTT 
TCCAAGAACCT 

CTCATTTTTTTTTTTCAAAAATTATTTCTACAATTTCCTCTATTCTTAAAAATCTTTGGTATTAAACTA 
AAAATGTACCT 

AACTAAACTACTAGGCTGGAAAATAATAAATCTAACGTTAACGAAATAAGCAAAAGTAAT TTT T TTTT T 
TCAAGACAATT 

CCATG T T T GGGGATGAAAACT GCCT GCAAT T ATATATCCTG T AAC AAT CCC C T T AT ATCAACAACAACC 
CGAGAACAACA 

AAAAGTCCACTGGCAGAAACCTTACCACCAATATTCTCAATTTGTGTCACTGATTGGGCAGTTTGTGTC 
GAT AT CC AT GA 

T GT GGT C AAACTGGC AGCAGT GG TAGAT GGATAAACACT T T CAGCAGC AAC AG T AACC G AGT T G AC AAC 
TTCCTTAGCAG 

C T T GT G TAT CACACT CT T CAT CATCATCCCAGCT ATCAT CC T CAT C GT CAC AC T C T GG T TC AG GAGT T T 
GATCATCTTCA 

TCATCGTAGCCATCTTCACCAGGGCAAACATAATCGTTACCAGATCCACCCCACCAGCTTCCAGACGAT 
CCACCAGTAAC 

TGAAGAAGAACCGGAATCACCTGAACTAACACCAGAACTGGATCCAGAAGTAGTACCACCACT T GATCC 
AGCACCAGAAC 

C CCACC AAGAG CCTGT GCC AGAT CCAGAACT T GAT C CACCT GT T GGCAC ACAT TC G CC ATCAT C T T CT T 
CAT AC C AT TCC 

CATT CACCAT CAT CAGAGGAGCCACTG GCAGAACC ACC GGCAT TGTCTTCCCCTT C ATAGC CAT CAT C T 
TCCCAGTCATC 

T G GAT AG AC AG T GT GT GT GGTAAT AACAGTCAC AGT CG T GG TAT AT AG C TG T CC AC CT G G AGC AAC AG T 
TGTCAGTGGAC 

ATGTGGTTGTGATTGTCAACGTAACAGTTTCATCACAGATTTCACCAGATTGTGTGAGATAAGTGGTAA 
ATGTCTGACCA 

CCACCAGTATATGTGATAGAAACAACTTCCGTTTCAGTATGTTGATTAGTGGTTGGAGGTAATTTTGTG 
GTGAGTGTTTG 

AG T T GT TGGC ACC C CATC GGAAG T AAAT GT T C TAG T GG T TGAC AC AG T T GGAT GGAT AG T AGG AAT T T C 
AG TTT CAC AAT 

CAGTCTCGTCATCGTCGTCATCAGAAGTGGTTGACTTTGTTGGGAGAACAGTAATAGATCCTGACCCAG 
TTGGAATAATA 

G T T GGAAG AAC AGACGTT GT T GGAAGAAC T GACCCAC T T GGAAT G AT GGTT G GAAC GTCTGTCT C AC AA 



TCAGTCTCAAT * 

TATCTTCTGTAGTGGCTTTTTGAAACAACTGACGAGACACTTGTCTTACTTTGACTGGTGATTGGAAGG 
GTTGGAATTGT 

AGGACCAAAATTTGGGGCT TCCATTGGATCTTTACACTCTCCACCACTGCACAACTTTAATTTGGAACC 
ACAACTGGAAC 

TAG T TT CT G T T TCAAGG C T T T ACCAGT TG ACCT GAT C GT AAT AAGC C ACG G GG T TAC CAACT T GT T GC A 
TCTTCACTGAT 

CAGCCATCAATCTTTGATAAGCCCTGATTTCTCTCATCTATGCAACAATCTTCTATTGTGAATCATTTG 
TTTTGCTAAAC 

TTGTAGTTGGTGTCCAAAAAAAAAAGTGATGTAAAATTTAAATTTTTCTGAACTTGTCGTGTAAAAAAG 
TCTCCAGAAAA 

AGGGACAACACACACACCAATTTTTCACCATACCACACAATTCACCAATAAGCTCTCTCATATCCATCN 
AATAATTACAG 

TACAGCCTCCTATTCNCAATTTTTGGNATTTAAACCAGTTCCCTTGGCAGGTCACCAGTTCAT 



FIGURE 45 

>retrotransposon_26 770bp POL protein: 2-322, LTR san: 390-377 

T GAT T T G AG AAAT AC CAT T G AAG AT C TAG AG T T AAAAATAAGG AAT T T G C AT GT AC AT G AGG AT AATC A 

AGCGGTCATTA 

CAATCTTAAAGAATGATAATTTCCACCCACATAGACCGATTGATATATGTTACAAATTTCTCAGACAAA 
AAT T GAAAG AT 

GGATTTTTTTCAATATCATATGTTGAATCTGGAGATAATTTAGCTGACTCATTCACGAAAGCTTTAGGA 
AGAAATAAATT 

GAT TGAACATACCAAAAGGATTAGAGAAAGAAAGGAT TATGATAATAATGCT ACACTGAT AGTGGACGT 
TAGGACGCTCG 

AAG AGAT T AAGAT AAACAAGAAAT T GGT ACAT CAT T AAT T AAT TT AGCT GTT TAC C TGAAT CAGGGGAG 
TGTTCGCTATA 

GGGT AGGTCT T CC AAGCT AATT T TAC CCGACACAAGATGAAATAT T T T CT GT TGAG C AC TCGTTGTC G A 
CAGT GAAAAAT 

TT T C AC T CAAGAAAAT AT T T TATCAT CAC T TT T TCT AGAAT G GAGGT T CAAG T GTT GGAGAATAGACAG 
CGAACACCTGA 

TATTCCCAAGGTCGAATTAGATTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAAAT 
TATCTTTTTAT 

AT T T AAAT T CT TAGT AT T GT C AT ACCACG TAGATT GATAC GGACATACTTAGCAC AT T T AAC AT AT AT T 
AAGCACC GAT T 

ACCTGTGACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGAACA 



FIGURE 4 6 

>retrotransposon_26 POL protein 106aa 

DLRNTIEDLELKIRNLHVHEDNQAVITILKNDNFHPHRPIDICYKFLRQKLKDGFFSISYVESGDNLAD 
S 

FTKALGRNKLIEHTKRIRERKDYDNNATS I VDVRTL 



FIGURE 47 

>retrotransposon_27 598bp LTR san: 143-523 

CT T C AAT GC T TCAC T T GT AC T AGT AC CCAT GAT T G T AT AGT GG T G T GGT T GAT C G ACT T C AAT AT AAC A 
AG AG AG AG AT G 

AGATGAGATGCTTTTATCGCGTATATATTTTTTTTTCCATTGACAATTCTGATTTCACAAATTGTTCGC 
TATAGGGTAGG 

T C T T CC AAG C T AAT T T TAC C CGAC ACAAGAT GAAAT AT T T T CT G T TGAGC AC TCGTTGTC GACAG T GAA 
AAATTTTCACT 

CAAGAAAAT AT T T TAT CAT CAC T T T T TCTAG AAT GG AGG T T CAAG T GT T G G AGAAT AG ACAGC GAAC AC 
CTGATATTCCC 

AAGGT CGAAT TAG AT T GAAAGATAAAT AATAG T CAT AT T TAT T T T GT AT T T AG T C AAT AAAT TAT C T T T 
T TAT AT T T AAA 

TTCTTAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACATTTAACATATATTAAGCACC 
GATTACCTGTG 



ACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGAACAGATTAGAAGCTTGGTAAATCTTTGGT 
TATTCATCACG 

TCTTGAGAATAATACAAAGTTTAATATAGTATTTTCAA 



FIGURE 48 



>retrotransposon__28 1082bp LTR san: 558-939 

ATAACCACAATAATCGGCCTCGTAAACGTCGTCAGTGGCTCAAACACATTGCTGCACCTTGAGCTCTAG 
AACAACCCCAC 

ACTCACTAGCCATCGCCACACCAACAACCAAATTGCTGATCCAGAAAAAATACCACCCCCGTAGTCCGG 
CTTGTATGGAA 

TAATTGCTTGGCCAGGTACGTCCCCACCTCATCGTGTCTTTTCTGGTTGAAATATGTCATCTCCCGGGC 
TAACAGTACCG 

TATCTCTGTGGCTGGGGCATCTATACTCTTTCATTCTCGGCTTACAAATCTATCTTGTTCACACATTTC 
ATATATCTGGG 

ACTTGTCGAACTCTCTGCACTCTATCATAAACTGGAACTCGCTTGCATTCTGGGACACACACTGGAGCT 
GGAATCCATGG 

TCAGGAAATGTGAAAATTTTCTTCTCGGGAAATATTTGTGACAATTAGTCCTAGTACACGATAGTTTCA 
TTACGCCCACT 

AAAAGTGTCTACTGAAACTCGGTCTCTATATCGTCAATATCTTTCATTTCTCTTCCTGGCTTTTCACTG 
CGACTTATTGT 

TCGCTATAGGGTAGGTCTTCCAAGCTAATTTTACCCGACACAAGATGAAATATTTTCTGTTGAGCACTC 
GTTGTCGACAG 

TGAAAAAT T T TC AC T CAAGAAAAT ATTT T CATCAT CACT T TT T C T AGAAAGG AGG TT CAAG T GTT GG AG 
AATAGACAGCG 

AACACCT G ATAT TC CCAAGGTCGAATT AGAT TGAAAGATAAATAAT AG T CAT AT T T ATT T T G T AT TT AG 
TCAATAAATTA 

T CT T T TTAT AT T TAAATTC T TAG TATTGTCATACCAC GT AGAT TGAT AC GG ACAT ACT T AGCAC AT T T A 
ACATATATTAA 

GCACCGAT TACC T GT GACAT T CC GAAGT TT AC TG T T T CGC GC ACGCT GGCAGAC GAACAC TT AT CAAGG 
TGCTACTCCCG 

CGCATCAGTTTCCTCTGGGTTCTCTTTTTGATCTTGGTGAACTACCTTTTTTTCCCACTCGCGTGAGAA 
GTTCAACACTT 

TTTTTTACCCATCCACCAAACTTTATTCTTTTCCCCACCATG 
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Table 1 Transformed colonies per jj.g DNA 





S. cerevisiae 


C. maltosa 


C. albicans 


pRPU3 


5000 


8600 


6500 


pRC2312 


1600 


6500 


400 
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Tn^ Ai*tinn <* An tier 

JLUoCl HUH CUllLLg 


lllOCI LIUil MIC 


H963RU3 


contis*4-2991 


19819Cmap) 


H963RU6 


conti<*4-2780 


9287fmap) 


H963RU8 


contiff4-2777 


6779(map> 


H963RU10 
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29/10/99 09:53 BALDWIN SHELSTGN WATERS CHCH -> 



NO. 048 P025 



contig4-2991 (H963RU3) 

Tca2 insertion 



attve regulator of transcription ( NOT4) I > unknown 

18200 15400 18800 18800 19000 19200 19400 19600 19800 20000 20200 20400 



contig4-27S0 (H963RU6) 
Tca2 insertion 

r 



i i i ( k 

8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 91 00 9200 9300 9400 96009800 9700 98009900 1000(1 010(1020(103 



contig4-2777 (H963RU8) 




5400 57006000 63006800 6900720075007800 8100 8400 87009000 930096009900 1020*1 OSWOSOd 1 10(1 140C7 I70i12000 



contig4-2296 (H963RU10) 

T«i2 insertion 

ti n k n own j 
32CH33Qi340350£60C70S8Q$9W00^^ 



contig4-3108 (H963RU18) 
Tca2 insertion 

iBJe-gtran d-bS^k repair prot. 



ATF-dependent RNA hefic 



78807900792079407980798030008020804080608080181 008120814081 8081 808200*220824082608280830083208340836 



contig4-2882 (H963RU30) 

Tca2 insertion 

arninopeptidase v | jnocteoiaf prot. (N0P4) 

5600 5800 6000 6200 6400 6600 6800 7000 720Q 7400 7600 7800 3000 8200 8400 8600 8800 9000 9200 




contig4~2025 (H963RU43) 



Tca2 insertion 



i \ i i i t t ( i 
1300 1400 1500 1600 1700 1800 1900 20Q0 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 330 



fa/lei 



29/10/99 09:53 BALDWIN SHELSTQN WATERS CHCH ■» 



NO. 048 P026 



eontig4-2386 (H963RU46) 

Tca2 insertion 



unknown 



t i i i j j r 

42M43O(44QW0GC^Oe47^ 



contig4-2668 (H963RU50) 
Tca2 insertion 



&CfbHol utBization prot 



jl 



unknown 



I f"^MO»B»Pl^^^ \ 

$1 0620t630640650«60670(68069Q(700f71 0(72Q£730(740f750(760f770(780tf 90€00t81^ 



Tca2 insertion 
ittve methyftransferase 



contig4-3105 (H963RU52) 

MSS51 




581 0<582GC5830(5B40C58506860C5370C588G<589^ 0 
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Tca2 insertion 

incomplete Tca8 
partial retrotransposon pol beta LT R 

200 400 600 800 1000 120014001 600 15002000220024002 600 28003000 32003400 36003800400042004400460 




contig4-2824 (B963RU59) 
Tcs2 insertion 




200 400 600 800 1 000 1200 1400 1600 1800 2000 22002400 2600 2800 300032003400 3600 3800 4000420 



unknown 
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contig4-2294 (H963RU65) 

Tca2 insertion 

smaflnuctear nfaonucteoprot|in^ unknown___ 

300 600 900 1200 1500 1800 2100 2400 2700 3000 3300 3600 3900 4200 4500 4800 5100 5400 5700 
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>retrotransposon_01 994bp Incyte: 1..994; kappa LTR: 548.. 827 

TAGATATTTATATATGTATATGATTAGACCAACATAAAACTAGACGTCCAAATATTTATTTATTTATTTA 

TTGATATATATTCTTATTTATTACTGTTATGATCTTTTGATTCACACAGAGATTTAATCCAAATCAATAC 

CTTTTGTTTTGTAGAAATCTTTTGCTTCTTCAATTTGTATTTTCAATTCTTTGTATTTATGTTCTTTGTC 

TTTGAATGTAACAATTCCCCAACCTAACGTTGATAAGGCATAAGAC CCAAATGTGACTAATC CC CACCAT 

GGCAAGTATGGCAATATTTCATCGTGTATTTTAGCTGGAGTTGGAATCACACCTGTGATAAGAGCAAAAT 

AAATAGCTGATAAGGCAAAAATTGTTAATCCTGTTTCAGTAGCTTTAGTCATTCTTATAGTTAGACTTGT 

TAAAGGGTAGTTGTGTTAATTGAAGATATGCTGGAAAACTATACTTTTCGTTGTTTTTTTTTTTCAATCT 

AGGTCGGGTGTGCTGTTATTTTTTTTCTCTCTTCTTGGTTCTTAGTATTGGATTATATGTTGGTTTATGC 

GACGTTTGTGTCAGGGAAATAACACCTTGATATAAGTCGTGCGTATTAGGTCAACATTGGTGAAAAATTT 

GCACTCATCGAGAGCCAGGAATTAGTATAAAAAGAAGAGAAAAGAAAGATATTTAGGATATTTATTATAT 

AGGGACCGAGTTTCAGGAGACACTTTTAGTGGGCGTAAACTTCATTCACTCTGTTTTTTGCTTATTACAA 

ATTATCACCTATCGTGTACTAGGACTAATTCTCACGAATATTCCGTGTATACAAACACTTATTGCCAACT 

TATGGTGCGGAACTTTATTTGTCTGAACCAAAATCAAAGTCACATCATTTAAATGAACGTTGACATAAAT 

AGATTCTTTATTCAATAGAAACAATTTCTTCCTTTTTCTTTTCTTTGTATTATTGGTTAGATTTCCATTC 
CATATACACACAAG 

>retrotransposon_02 1348bp Incyte: 1..1348; kappa LTR: 764.. 1043, POL 
(contains stop codons) : <136..714 

TGTATGGTACATGTACGACAGCCCAAAAAATGGTATCATTTAGAACTGTATTGGAGAACATTAGTTTTGG 
TCCAACATTGCGTGATGATGGTATGTTTTTCGTATTATAGTACAATGATGGCTCAATGATTTATTTTAGG 
TTTATATGTGGATGATATCTTAATGGACAGAATCTCAGATGGAATCGTTATCAGATTTGTTGAACAAGAG 
AGAGTTTATTTCGCGTCAAAATCAATTTAGGTCTCATGACAGAATATGTGAGATAAAATGTCCACGTAAG 
CAAAACTGGGTGATACTTTGAATTAAGAGATACTCCTAAATAAGCAAACCAAGGATTTTAAACTACACAA 
TTCGTATGGTAAAACGTGCTTTGAGTTCCAAATGATAGATGCGAGATACCAACAAAATAGAACTGTCGCA 
AATGCTGAAGACAATTTCACTGAGGTTCGAAATGAAAAATTACTTAATTCAATTAAAAAATTTATACCAA 
AAGGTGGTCTGGAAGTGCTGATATGAACACGAAATTTAATGCATTCTGTGGAAAATTCGTTTAAGCTCAC 
AATCGGAAAATACTACCATTCTACATTTGCAGAAAATTAAAATTGTGTTGTGAAATATCTACATCCTACA 
AAGTTCAAGACATTTATTGATGGTATATTCAAAGGACTCGATGTTGAGAATGATAATAACCTGAACCAAG 
ACGCTACAAATGCT7UVTTGAGTAATTCGTAATTGCTAAACAACGCCATTTCGAATCAGGGGAGTGTTGGT 
TTATGCGACGTTTGTGTCAGGGAAATAACACCTTGATATAAGTCGTGCGTATTAGGTCAACATTGGTGAA 
AAATTTGCACTCATCGAGAGCCAGGAATTAGTATAAAAAGAAGAGAAAAGAAAGATATTTAGGATATTTA 
TTATATAGGGACCGAGTTTCAGGAGACACTTTTAGTGGGCGTAAACTTCATTCACTCTGTTTTTTGCTTA 
TTACAAATTATCACCTATCGTGTACTAGGACTAATTCTCACGAATATTCCGTGTATACAAACATTATACG 
TGTCTGTAACTACGCGAAACTACTTCGTCTCAGTTTTTTGTTACAAACAACTTTCCGTATAGACCTGAGA 
TTTTGTCAGCTTGATTGAATGGAAGAGTTTACTAAAGTACCAGAAAGGTGTTTTATAGATAACATGTAGA 
TATATAAAAATGTTATATTACAAATGACTTCCAAAAGAAACTGTACGAATTTTGCTGTTTATTAAAAACC 

AGTTCCTGAAAACTAGTATCTTAGCTTCAGTACATTTAGCCCACCTAAATTGGACCTATGACAAGTTCTA 
CTTTCCCGACAATGCTAA 

>retrotransposon__03 3034bp public: 1 85/2131. . 3034, Incyte: 86.. 2130; kappa 
LTR: 75.. 354 

TGGTTGGTCTTATCAGTAGAGGAGTGAGTATCAGTTGCTGTGGTTTTTTTTTTTTTTTTGTCGTCTTCAA 
ATTTTGTTGGTTTATGCGACGTTTGTGTCAGGGAAATATCACCTTGATATAAGTCGTGCGTATTAGGTCA 
ACATTGGTGAAAAATTTGCACTCATCGAGAGCCAGGAATTAGTATAAAAAGAAGAGAAAAGAAAGATATT 
TAGGATATTTATTATATAGAGACCGAGTTTCAATAGACACTTTTAGTGGGCGTAAACTTCATTTACTCTG 
TTTTTTGCTTATTACAAATTATCACCTATCGTGTACTAGGACTAATTCTCACGAATATTCCGTGTATACA 
AAC^^TTTTCGAAACTAGTCAATCACAACAAATTTGTTTGAGTTCAACTGAAACGATAACAACCATCA 
TAATTCGATTGAATACTTTGTGTCGTCTCTTTCTTTCTATGCATTCTACTACTTGTCGACTACATATATC 
CAGCCATGTCTTGCATATATCCTAGCAACTCCTCCCTCCCCCCTATTGTTGTTGTTTTTTTTAATAATAT 
TTAGTATATGTATCAATGGTAAAAACTATTTTTTGTATTTTTTTTTGGTTTGTAAATTTTGATAGTTTTT 
TTATTGAAAACTTCAAATCTCAAAAATTTCTAATAACAACAACGACAACAATTATTAAATGATACTCTAC 
TCAAAAAGAAAATTTGATGAAATGCCAAGAACAATATAATTTAGTCAGTACATTAATACTCAATTACAAC 
AACAACAACAACAACAACAACAACAACTGTTCAATGCAATAATAAGAGAGAAACCAATAGAACTAATTTA 




GTTTTTCAAATAGCCAACCTTCAAAAAAAAATAAATTATGTGAATGCATAAAATATGTATTATTAGTAGT 
AGTTTGTAGTTGTTGTAACCAGAATTCTCAATACATACTTTTTCATATCGATCCTTTTTCTTCTTCCTCC 
TCGATTTTTGGATTATATTAACTAAATTTTGCATTTACGTTTATAATGATTTTCAATACAAAAAAAAAAG 
CATTATAAACTATATATTATCTTGAATAGTAAAAATAAATTAGTATTGATAGAAAGTTTTTTACATCTGA 
CATTATTTACTAATTTAAGGAAGAATGGGACTTAAAAAAATATCTAAAAACCCATGTGTTCTAGTTTTTC 
ATTTGTTATTAGCTTATTATACTTTACATTATTATTTTTGCTATAATCTAGAAAAAAAAAAGTAGACTTT 
AGATCTAATGTATAATTGGTATATTGATAGTTTTTTAATGTTTTTTTTATTAAATCATTTCATTTATTTG 
GTCTTCTTTGTTTTGGTATTGTCTATGTGGGGTGGCGGAGTTGGGTGCAACGCAAACAAAAATATTTTTT 
AGCAATTAAGTTTTTGCCGTACTGTATGGAAATTAGTTCCATTATGATAGCATTTTGCATCTTTGATTAA 
TTTTTATCATTCCATAGCAACAATTACTTCTTTCTCCTCCGGTGTCAATCAATCCCATATAGGTCTTGCA 
TTGTTTTGTCAAACGTTTCAAATTGGGAATTGTTTAGTTTGAAAAACTATAGATTTCCTTATCTTGATTC 
AGATCTCTCTCTCAGCCATGCTTATGTAACTTAGCTATTGTTTCTGTTATTGTTATTGTTGTTTGGTGAT 
TATCGACATTTGGGTTCATTTTATAAAAGCAAACGAGAGATCGATAGCAATTATAAAAACCATTACACAC 
ACCCAAAAAAATCAAAGTAATATGTTATCTAATAGGACAACTGATGTATCCTTTAATTTAAATATTTTGG 
AATAAAAGTACACCCCTTTCCATCATATTCATGTGCAATTTAAAAGGAATCAATTATCAAAAACCCAACT 
AACCAACAAGTTTCTGGTATATAGCCTTTCTGTCCAATTTTTTTTTTTTTTTTGAAATCTAAACTACTGG 
CCTCTTTAAACTAAAATCAAAGATCACTTCTTAATTAGTTTTGTAGATCCAGAATCGTTACCAATACTGT 
TAATAAATGATTGAATGATGTAATTTCAAATAGCAATCGTTGAGTATATTATAATCAATGAATAGCTAGA 
TTTAGAGACAATTATAATAATAACGAATCATCACAAAAAAAAAAGTGGTGTACAGAAAACGTATGTATGT 
AAACTAGATACAATGGAAAGGGCTGGGAGCGGAGGGGGGGGGGGGGGGTTTAATTCTGATTAAGAAAAAA 
AGGGGAAGGACATGGAATTTATCCACATGAGAGAAAGGGTTCCTAAAAGATGTCCTTTACGGTGGGCCCG 
GGGAACCCCAATTTTCAGAAATTTCACCTGTTTGGGGCGCATAATGTTCACAACCCAGGGTTGCCTTAAT 
GACGTATTCTTTACAATTTCATCAAACCAGTTGTTGTTGTTTAAATAAAAGTTGATAGTTGTATTGCTCA 
AATTCAAGGGGGGAGGGGGTGGTGAATTCATATTTCTCATATATCACACTCATATTTGCGAATACTTGAA 
TTACTCTACATTTATGCTTTTCACATGGATCAATTTAATATAAGTACATCAATCCAATATGAACATGAAT 
GTACCAACTAAAATTAGGTGTTAGTCTGAATTCTTGTTCACCATTGTTTAGTTTTGTTTGTGATGAATCT 
CAAGATACAGATTGGTTTTACAATAATACGTTTGTTGTTGCTGTATGAACAGGCAGTCACCCTTCCTCCC 
CCACAAAAACATATTCTGTATAATCTATGTAATATTATAAGATCCAATCAAAACATCACCACCAAATAAT 
ACTGTAGTAATGCCTAATCTAATTACTAAATAGAAATATAGAATGGGGTATGGTTGAGATTTTTGGGTAA 
GGTCCAATTTGCCAAAAAAAAAAAAATATGCAACCTTTTTCCCTCCTCCACCTCCTTCCTATTTCGTGAA 
ATTCGGTAGAATCCGAAAGACTAATGAAGAAAAAATCAAGAAAAAAGGTTAAGGTCATTGATCAATTGAT 
GGCAAATATGTAAGTAAGTTCGAT 

>retrotransposon_04 35 04bp public: 1 466/2581 . . 3504 , Incyte : 467.. 2580; 
Teal-like LTR: 688.. 1075 

TTTTCTCTTCTAGCTTGCAATTTTTGTTGACGTTTACTAGTAGCAGAATTGGTTTGTTTAGTTTCTGCTT 
GTTGTTCCTCTGGTGTAGAGCCATTTGATTTATTCTTTTTAATGAATGGTAAAATAAAATTACTCAATTT 
GTAAATAGCAAATCCAGGAATTATCAAGTACCCATACCATACTTTATTACTTCCAAAAATAATCATCAAA 
ATATCGAACCCCCAAGTCAAATAGATAACATCAAAATAATATTCATATAAACTCCCCAGTAATCTAATGT 
CTTCACCACTTGAAACTAAAGAGTTACCATTGGTATATTTGGGACGACCAAATTTTTCCAAAGAATATTG 
TAAAAATATACTTGGGATGGAGAAAATTATCCACGGTTTATAGGAAGATGGACGATGGAAAATGGAGATA 
ATTAAAAACACAATAATGTTAATTGATGCGGAAATGATTAATAATTGATTTAATATGTTGGTATTGGCTA 
CTGCCAACTTCTTAGCTGATGCAGATGCCATTGTTAATATTGTTAAATTGGGTAAATAGTATGAAGGAAG 
CTTTGGCAGGCGTTGTTATTTTTTTCACCAATTATTATCATCACCTGCGGAGGTTAGTCAATTTGAGATT 
GTGCGAGGGAAAAAAAACGACCTCCATACACTACCTCAAGTATAAGTCCAGTCCAATTGTTCGCTATAGA 
GAGATTTCCTAGCCGGAATGCACGACAATCCTGAGACGGAAGTCGATCGTCGATGCCCATGGTGCGTGGT 
GAAAAATTTTCTTAGAAAATTTGTTCTTTCCTTCAACTGCTTTGAAGAGAGGGAGGTTCAAGTGGTTTAA 
GTACGACGGTCACAAAGATTGCGGCTTATGAGGCCCGAACTGAGTTGAAATACAAAATCAAGATATAATT 
ATATACCTTACTTGTCTATATTGTTTTATAATACATTCTTCAGATATTTAAATTTCTGTGTATCATCCTA 
TAAAACAGAGATACATTCAGTGCATTTAGTATACTGAGTGAACTGGTACCTGTGACATTCAAGATAACTG 
TTTCACGCACGCTGGCAGACGAACACCAATAGTATGATGAAGAACTGACCATGGTGTAAGAGGTTTGATG 
GAGTTTCTTTTTTTTAGAAGAGGTTGATAAGCCAACAGATGAGGAGTAACAAGTAACTCGCAACATTGTA 
TAACATAAGTTTACATCAAATCAGAATTTACT^ 

AAAAACGAGCTTAATGAGTAGACGGTCTGTTCATATGAAACAATTGAAAGGGTTGAATATTGTTTGGAAA 
ATTATATAATTCATGTCAAACTGGGAGGCTTAAATTATGGTCACTCCACAGATTATGAAACGTAGTTACA 



CAATTCTTGGACCTGGAAATCCCACAAGAGAGCGTTAGTTAGTTTGCACTCTCCTCACCAGTTAAACTAC 
CCATGATTCTCCAATGTGGCTTATTTAAGTATCAGACAACAGATACATGGTTTCCAAGTGGTCTCATTTT 
TGGTTTACTGGAGTCTGCATTCCCCACAAAAGTACCTTTCAAAACTAATTAATGTAGCTTCTATTTGATA 
GCCTCTGTTATGGAAATAGATTTGCTCTGCCCAGTGGGTGTAATTATTCCCAGCTGGAACTATTCCGATA 
GATATGTTTTAATGTCAATTTAAATCTTGTAATAATAGTAAGGATGCGGTTTATCCGCGATCTTCTTAAT 
ACCTGTGGAGTTACTCCAGAACAGAGGTTCAATTTTTTCTTGGTTGGTAAATTATCCGAGTAACACGGGG 
TAGCTTGGTTACTCCAGTTGAGAATGTAAACTATAGATGAAGATTTCAACACGCAATTATTACCCCACCT 
TGGCGAATTACTAATCGACTATTTGTTAATCCAGAAAAAATTATACACAAACACTGCCTTTTTTTAAAAA 
AAGCGTTATTTTGATGGAACGATAATTAACGATGGTTCTGCACAAAAATGTGGTC CAAAGCCC CAGACTA 
TTCTGAAGTATGATTTGTTACTTAATTTAGTGAATAATTAAACATAAAATCTGGAGAAAAATTTTTTTTT 
TGCTCTCATGACCAGTGGCAAATTCTTGGTAACGAGGCTTAACATTAATCCGCAAATTACCTGGCAACAG 
AGAAAACACCCAGAAAGTTCTGTCGTATGAGAAAACCTACAGTTGTTTCCGATTTCTCCGAGCACTAAAC 
ATAAAGAGACCAGTAATGCTAAAAAAATTTTTATTTCTGCATTACTGTTTTTAGCAAATACACGTCTAAT 
TTATTGTATTTGTTAAACATTCTTTTCCTGAAATTTTAAGAAAATGTTTTGGTTTGTTGGAATTCCATTT 
AAACGGTACTTTGGGGTGCAGACAGCAATCCATTTGGAGAGTGGCAAGTCTACACGAATTTAGCTAAGGT 
TCACTATATCGTGTAACAAGAAATTTCTATACCAAATAAACAGCACTTGATTGAACTACAATATGTAAAA 
ACTTGCTTTTATTACCAGTCTTCATACATACCCCGGTCTTCTCTTTTCAATATTCTGTATATGTCTTTAC 
AACTCTTAACACTCCGTAAATGTGCCTTTCGAATACTTTTGCAGCTGGATATTTTTCCGGTGCACCTTTT 
CAGTTATCTTTTGCAACTTTTCGCGAGCAATGACAAAAGTTTGGGGCGTGAGGCAACAAAATGCATGGCA 
TTACCAGTACAGTATCGCCACAAGTGGTTTTCCTTGGCATTTCTTGATTGTTTAGTAGAACAATTCAATA 
AGACTTTTTTGATCATGAATTTTTTTTGCCATGAAGGTGCTTTCATTGTTCAAGGTTGAAGGGGAATTGA 
AAAATTTGTAGAGTCACAATCAAATGACTTGATAATTTGATAGAAAAAAAAAAGAAACCTTAAAAAATAT 
TCATACCAATGTATGCATAACCATAAAGAACTTACTAATTATGCACCTGCAATCAGAAAGTCATTTCTTA 
CGATGATTTGCCAAATGACCGTAAAACGACTAGCAAAAACAGTGACATTTTTTTTGAAAAGGTGGAGATG 
AAAACCATTCTGGTTTGTTTCGTCATTTACACAAATATTCGACACAAAAACTATTAATTCAATACAAACA 
AAAAAATGTGCAGGAAGTCTTGGAACCGATACAAAAATTTTTACAAACCACGTACACTATTGTTTTGGGG 
AAGAATTAGTCGGGGAAGAAGGCCCAGAAACTTGAGTAAAGAGTGGATTCAACACTTTATAATAGTATCA 
TTTTGTAACACAAAAATGAAATACACCCAATAAAAACTGTTGAAACATTTATCCGTCAAGCTTATTCGAT 
GGAGTACAACACTTTACATTTCTTCCGAAACAATAACTATATAAACCCATGTAAGTCTCCCCTCTTTTGT 
TTCAAACGTCTTATCAATTTTTCTCTTCACTACTTTTCCAACTTAACAATCTTCACTTATAATCTCAACG 
AATC 

>retrotransposon_05 3955bp Incyte : 1..3955; Teal-like LTR: 2656.. 3043 

TGTTAATTGATACTAAGTGTAATTGATTGGAATACTAGAAAAAAAAGAAAGAAGAAGAAAAGAAAGAAGA 

AAAAACTCAACTTTCTTTCGAAAATCAAGGATCAATGTTGGTATTTATATACTTTTTTTTTTAGTCAAAC 

TCTACGAAATGAAATTCAAAGAGAATAATCCACAGAAGAGGAGAGAGGGCAAAAGTGGGGGGACCAAAGG 

GGGTTAGAAAACAGGAAACAGCAATAGAGAGCAATAATTGAAAAATAGTGTTGTCAACAATAGAACAAAT 

TGGTCAAACTTTAAATGCAAAACATGAAATTCCCAATTTCCAGAATAAATAATATCAGCATACATGGCCC 

CGAAAACTACTTTACCGTGTCGCTTTAACCCCCCCCTTCCTAAAACGAGACAATTAGACATACATTCCAC 

AATTATCATAATCCCCTTTTTTTTCCTTACAAAACACTTTATTTTTGTCGTTTTCGTTATTTGCTTCGAC 

GACATTGTAAACTCTTTGGATTTGCAGTAGTAGTGCTCCTGGTGTAAGGTGGGTTTGGTTGTAGAGTAAA 

AGAAACGACAATTGATTACACCTCGATATGCATACGCATGGCAAAGAGAATACCGAGTTAATAGTGAGTC 

TATTAGTGTTGCAGGAAAAGTTATACGAACAACATTTTGTTTAGTGTGGATATTCCAGATCAACAACAAT 

ATGACTAAAATCATAGCTCTAATTTTCAGTTTACCTTTGTTTATTACGATACTGCCACAGTCGTGCTGTA 

CCAGGGTCAGTTTTAGAAAAACTATTCTAGAAATGATGAGTAGAAATGTACTATTATGAGCAATATTTCA 

AAAAGTGAAATTATAATTGCTGCTGACAACACCAACAATACATACAAATTTGGAAACGAGCAAATCGAGA 

AAATTTCAATCCGTTTAGCAAGTTGTTCGTTGTCGTCATTGTCGATTAGTTTCAGTTTCTAGAGGTGAAA 

TTTTCTATGGCACCAAAACCAAAGCCTCAATTTTAATTTACTCTGTGTGGTACAAAATACATTAGAGAGG 

ATCCTCTCCAAACAGGATTGCAGGAAGTTTTACACGAGAATGATTTACTACACGACGTTGAATTAAAAAG 

CTCAACCAGTTTGTCAGCAATTTTGTTCTATCTGTTCAATTTCTTGTATAAAATAAAGCAATATGAGAGA 

GCATCTAAATCAATAATGTCAACACAATATTAAACTTTGAGAAGGATTGTTCAACAAAACAATCCGATGA 

ATAGAAGAAGAATAATATCAAATTGTTCCTGATTGATTGTTGTTATTTATTTTTTATCTCCGAATTCCTG 

CACAATGGCTCAACAACAGCCAACACGGATCACACATTAAATTTTTTTTTCGTGCAGGACCCCGTGGTGG 

TGGCTGTGGCTGTGATTGTGATCATTGTAGTTTCTGCCTTGATGATGACAAAAAATGATAGAGTTCAGTA 

TGAGGAAGAAATTAAGCGATATCGGTTTATGATGTGTTTAGTTATTAATTGCTCTCAATGGTTTTCAACA 



ACGTATACAAAACTGGTGGTGCTTGAAACGAATGAGTAATACAGATCTAATTAAGCTGTGATTTTCTAAG 
TTTGCCTTGTCTCTACAGTTCAAAAAAAAAGAACAGAACACCTCAGAGGCTGTTGTGATGCAATTTTTAG 
GAACCTCAACAACAACCACTGACTGATCTAAGCCAGCATCTGTTTAATGGGTTTTCAAAAAGAATGGGGC 
AAACGGGGAATTGAACCCCGGGCCTCCTCGAATTTTGTGTTTGGTGAACAACCCAAACGAGGAATCATAC 
CACTAGACCATTCGCCCAATTCGATGACTTGGAATTATTCTAGTTATTTTTGACATACAAAGCTCAGCTT 
TATTACAGATAGTCATGTTTGCATGGATGAATTAGTACTACTAATAATATAAGAAAACTAGTTAATTGGA 
GTCAATGTCTTATACATGTCTTCTGATGGGTTATGCATTGATTAATTATGAATTTCTTTTAAATACAATC 
TATTGCTATTATTTGTATGTAAAACTTTACCCAAAAACCAACAAAAAAGAGTGGTCTTGGATAAAGATTA 
AAGTAATTCCAAAAAGATTTGGTAATTAGCTATATTGTTTTGACGTACATCTATAACTACAAATAGCCAT 
TCAGTTTGATTATGTATATTGACATAGTTGGATTTGTAATTTCTGTTAAAATGGAAAACCCTAATCAAAT 
GTATATGTTGAATAGGTAGTTAAATTGTACAACCTACTACTTGTTGTCAATTGAATTCAGAGCCAATACT 
TATATCTCCTGGAAACTGATACACAAACGAATTGTTAAACTATAACACTCGACGTTCACATCTAAGGATT 
CATCGTCGTTAAGATTTATACTCATTAGCAAACTCACTTGCCATATTAAACACTTCTCAATCTATTTCCC 
ACAATCCAATTAATCAGCACGAAAACTAAGATACTATATATATCTGCCTATACCTGATATACACATGGCA 
CATGGCGTATCCCACAAAAAACCGTCAAGACAACACCAATATGACAATGCCAATTATACAATTGCATATA 
CCACGTGACTTCATTTTATGGTCATGAGAAATTAACTTATCATGGGGTTAGGCGAGAATATCAACTGTTC 
GCTATAGAGAGATTTCCTAGCCGGAATGCACGACAATCCTGAGACGGAAGTCGATCGACGATGCCCATGG 
TGCGTGGTGAAAAATTTTCTTAGAAAATTTGTTCTTTCCTTCAACTGCTTTGAAGAAAGGGAGGTTCAAG 
TGGTTTAAGTACGACGGTCACAAAGATTGCGGCTTATGAGGCCCGAACTGAGTTGAAATACAAAATCAAG 
ATATAATTATATACCTTACTTGTCTATATTGTTTTATAATACATTCTTCAGATATTTAAATTTCTGTGTA 
TCATTCTAT7^AAACAGAGATACATTCAGTACATTTAGTATACTGAGTGAACTGGTACCTGTGACATTCAA 
GATAACTGTTTCGCGCACGCTGGCAGACGAACATCAACACTGATCATTTGTTTTTTTTTTATTTCTCCTT 
TTTCTCCTTTTTCTTTCTTTTTTCTTCTTTCTTCAGACGTTGTTGATTTATTTTATCGACAGCATCCTTT 
TCTTTGGCCACATATCCAAGCGATATACTGGCCAAAGCGAAGTCCTTTTATAAAGCAATGCTACCAAATG 
TAACAGTTCGAGGTCAGAAGATTAAGCGGGTATGTTCACACGGATATTTTATGGGGTATCACTTGTACCA 
AACACTTTGATACGATAAGAATATTTGTAATACTAACTTCAGTGTCTTTCATAATCAGCTCATAACCTGT 
TGGAATTTAAATTCGTATGTTGTTCATTCAAAATTTTGATAAATGGGACGAGAAATCATCGTTGCCTCCT 
AATTAGATTATGACTTAGTACTAACTAAACTGTTTATCATTTTTTAAAGCGTTGGGCTCCATGTTAGAAT 
AGATTATTAGGGCGGTACGTATTTCATAATTTATATATAGGTACTTATTTTTACTAATTTATTGCACAGG 
AAAAGATAAAAGGTATCGATTATACCTATCAGCAAGGTTTAAGCAAAATGAAGTATTTTTACCATATTTT 
TCCATTTTTATATAGATACATCAAGAGGTTTATTTTAAGTTCACCTGGATAAACCATTCAACTAACCCAA 
TTGAATTGAATGACAATTTGATCTCCAAAGAGGGATTCATTTCTATTCTGGAGAGATAAACGTCATTGTT 
TAGGAAAGAGCAAGAGATAAGAAATCTTTTGTATATTGTATATATATTATTAATGTTATATTACACTATT 
GTTTGTTTGTTTGTTATAATTATATGTGAGATTTCATATGTAAGATGTTGTTATCTCTTTCCATTATTTA 
GCTTTTTTGAAAAAGCTATCAATGGCTCCACGTTT 

>retrotransposon_06 1434bp public: 1..1434; Teal-like LTR: 87.. 475 

TAGATGCAATAGGTGTATGAAATGTATCTAGATTATATCATGAAGCCCTTGCAATAAAATCTAGCCAAAA 

ATTTGTGTACTGC7VATTGTTCGCTATAGAGAGATATCCTAGCCGGAATGCACGACAATCCTGAGACGGAA 

GTCGATCGTCGATGCCCATGGTGCGTGGTGAAAAATTTTCTTAGAAAATTTGTTCTTTCCTTCAACTGCT 

TTTAAGAGAAGGGAGGTTCAAGTGGTTTAAGTACGACGGTCACAAAGATTGCGGCTTATGAGGCCCGAAC 

TGAGTTGAAATACAAAATCAAGATATAATTATATACCTTACTTGTCTATATTGTTTTATAATACATTCTT 

CAGATATTTAAATTTCTGTGTATCATCCTATAAAACAGAGATACATTCAGTACATTTAGTATACTGAGTG 

AACTGGTACCTGTGACATTCAAGATAACTGTTTCGCGCACGCTGGCAGACGAACAGCAATTCTGTAATTG 

TCGTAGAGTAGCAACAAATCTTCCCGATGATTGGTACTTGTGTTAGTCTACACGACATGTGTTTTGGTAC 

ACTTGAACTGTATGTCCAAGAATGGAAACATATGCGGGAAGGACGCGAAAGATGAGTTTGGTATAGAAGG 

GATAAGAACTGTAAAATATATTATGTAGTTATATATTTTAATTATGGGAAATTGAGTGTTTATTCTGTTC 

AACAAGTTTCAAC CGTAGAGATTACATTTAAAGTCTGTGGTCGAAATC CACAAGATACAGCAAATTCATG 

AATTCACCTATTTAAATCAAGTTTACCAAGCACCATTGCCTAGAACTTGCCATATCATCAATTAAGTCAG 

ACATTACTAATTTGAGCAAAGCTTTTAGCTTAATGGGCCAACTAATTTAAGTCGAATTGGTAATGCAATC 

TGTTCTTCATTTGAGTCGCTTGCTACGGCTCCATGACACATCCATTTGATTGTTTTAATTCGAGCAATTA 

TCCACCATAACTCTCAGTAATATCATTAACAGTTTTACGCTTAATAAGCATAGAAAGTTGTATGAAGTTG 

TCTCCTAGGTATGCTAGAGAGATTTGTATATACGACCAGTAAAGAGTGTGATGAGGTGTTTACTGTAGGG 

TAAATTGCAATTGACTTGAGTTGATAGCGGTTATTACAAAAGTATAGATTCAACAAATTAAGACAAGTAC 

CAAACGATAGGCCGAATGTGACTTATACCGTTGAAGTTCAAGCGTTTTTAACAAATAGAAATGTGAGATT 



AATGAGTTCGACAAATGTTTTACTAGATACTATTAATTTCGATGTACTATATAAGTTTAACCAGCTATAA 
CCGGCAGAGCAGACTTCCTGAAACTCAAATTGGTTGTGTTTGGACTTGAGTTACACCACAAAGTTTGACA 
ATCGTGAGGACATAGCAACCTATCAAGCCACTCA 

>retrotransposon_07 1608bp Incyte: 1..1030, public: 1031.. 1608; Teal-like 
LTR: 1048.. 1435 

TGCTAGTATGTATTTTGGCTCTTTGATCCTGAATGCGACAATGCAATACAAATAGTAGAAATAATGATGG 

TGATACTACTAGTATTAATAATAATCCGAGAAACGATATCACAAAATAAATCAGTGCCCAATGAGGTTGA 

TGCACAAATATTAGTGGTGTGTAAAACTAAAGAGAATATCTCGCTATGATTTCTATTGATAAGAAAAGAT 

GAGAGATTAAGGAAATATCTTCTGTAAAGTTGTATCGCCACCTTTTTTTTTTGTAGTAGTAGTATCGGTT 

TTGGTTTTGGTTTTCTCATTAGTTAAGATTCTTGCGATAAGGCACGACCTTGATCATTTGCATGTTTCTC 

GTTTAATTGTTTTTATTTCTTTTTTTTTTATGGTGTGTGGTAGTAGTTACAGATATCGACGGTTGCAAGT 

GCACGAGTGCTGCGACTGACCGGATCGTCATGCTAAAAGATTCAGGGGTGTGTAAGAGCGTGCCAAGTCG 

AGGAGGAACCAACATTTCACAACTGCTTCAGGATAGGGCATTCTTTTTCTTCTTTCTATTTGATCTAGCC 

TTGCGTCTATTCGTGTTGTTGGTTGGTACAAGCGAATATCCCAATAAGGTTTTTGTTGCCTATGTGCATC 

GTGTTGTAGCATAGTAACGAGAGATACGATTCTTCTTCTTCTCCTTCCCCTTTTCTTTGGATTGCTTTAT 

ATTTATATATATATATTGTCATCATCGTCACGAAATTCACTATCATTATCAATTATTTTGTTTTTTCTCT 

ATCTTTGTCCTCCTCGTTTAATCCTTATCACAGTTTTGGGTTGTTGCAATTTCTTTTCATTCTCCAGTTG 

AGGCTTACACTTTCTCTTGGAGTTTCCGTTTATAATTTTTACACACACAAAAGCACAAACTACACTTTGT 

CTTCACAGTGTATAACAGATACCACAGTATTACTAAGGGGGAAAACTAACCTAACCAAAGGGACTGACAA 

AATAAGTGGAAAGACTACAAATGACGCCCTTAATATACGAGAGAGAATTGAAAAGACATACACATAATGT 

TCGCTATAGAGAGATTTCCTAGCCGGAATGCACGACAATCCTGAGACGGAAGTCGATCGTCGATGCCCAT 

GGTGCGTGGTGAAAAATTTTCTTAGAAAATTTGTTCTTTCCTTCAACTGCTTTTAAGAAAGGGAGGTTCA 

AGTGGTTTAAGTACGACGGTCACAAAGATTGCGGCTTATGAGGCCCGAACTGAGTTGAAATACAAAATCA 

AGATATAATTATATACCTTACTTGTCCATATTGTTTTATAATACATTCTTCAGATATTTAAATTTCTGTG 

TATCAACCTATAAAACAGAGATACATTCAGTGCATTTAGTATACTGAGTGAACTGGTACCTGTGACATTC 

AAGATAACTGTTTCGCGCACGCTGGCAGACGAACAATTGCGGCGAAAAAAAAAAGAGGTCGCCAAAACTA 

AACTGTTGGGACGATTTGCTGCCAATCACAATGAAAAAAAAAAAGAACAGTTGGTTTGAAACTTCTTCCT 

CTAATACAGAATTAACTGATCTTTCTATCACTGTTTAAACTATTCATTACTCTCAAGAACTTACCATG 

>retrotransposon_08 1385bp Incyte: 1..1385; Tca2-like LTR: 49.. 328 

AATAAGTGGATTTATCATTACTATTATCGTAATGCTCAATCAGGGGAGTGTTGGTTTGTGCACTATTTTG 

TGTCAGAAACTGATCAATGAAAATGATGGTTATTATGAGAATGGAAAATTTTTCCATCACACATCAGGTG 

ATGACAGAACTAAACTATATTGTGTAGTATAAATAAGGGTATGAAATACCAACATCCCAGAATATCAACG 

AGATAGAAGGGAGGAGTTTCAATATATATCTTGTGAATAATAACTTCGTTCTAATTCACTATACACAACT 

AGACGTGTACACGCTCAATCTCAGGTAAAGAAAGTTTATATTCCATCAACAGTACTAGTATTAGTATTAG 

TAGTTGCTTTGTCATATACAAATAGATTAATTAAACTAACTAACAACCTATATCAAATCAAATCATCAGT 

TATATCATCATCAACATATTCATCATCTTTATTCATTCTATAAATTGTCATTGCCATACTTGCAAAATTC 

AATAAACTCATAATCCAATCCGGCAAAGCAATTCCATATAATTCAATGAGATTAAATGTTAAATCTAAGA 

AATTCCCAATTAATTCAATAATAAGCATCATTTTATCAAATCGTAAATCTTTTAATACTTTTTTGTATTT 

TTTATTTAAATCTTCATTTATAAAATTTATTCCAGTCTTGTTTTTAGTGGTGGTAGTAGAATTTAATAAA 

TCAACTTCAATATTAACTTTTCTAATTTTACGTATTACATTTAGTAATTGAGATATGGTTTTCCTGATTA 

AAAAAACCAATATTAATACCCAAATTTTATTGGTTTGTTTTAAAAATCGATTTAAAAATTGTGGGAACAT 

TGGTAAATTTGATAATAAATGTAAATTATCTAATAAATTGGCAAGATTTTCTAAAATATTAACAAACATA 

AATTCTATTTTTTTCAAACTAAATGTATTTGGTCTATAGTATTTTATAGGTTTATTATTATTATTAGGTT 

TACTCCCTGACTTGGGTTTCTTCACTGGAGATTGACCTCGTTCTTGTCGATTGTTGTGAGATGATTTATT 

AATATCAAATTTATTAAATACTGAAGGGTATTTTGGTTTTGGAGGTAATTTAGCCTTAGTAGGGGTTGAT 

AATGGTTGTGATCGACTTTGTAACTTTTGTTGTTGTTGTTGTTGTGCTAGTAAAATGGTTAATTTATCAA 

GTTTATCTGATGTGATTGAAGTATTACCCTGTTGTTGTTCTTTTTGAGCTAGAAGAAGTAAATTATTGAT 

AATTTATTGTTGACGTGAGTCAGGATTAGGATCAATTGAAGTATGTTTTAAGTTTAATTTTTGAATTAAA 

TCAATATTCTCCTGTATTGTTGTAGTGAACATTACGGATATTAATAATAAATAAA 

>retrotransposon_09 14 8 3bp public: 1..525, Incyte: 526.. 1483; Tca2-like LTR: 
871. .1150 

TGAATAATCAGGGGATGCAAGTTATTGATTTTGCCAGTATCCAATTTTACTTGTGGTTTCGAGAAAGTTC 
TTTCTCTCATTGGTAGTTTAAAGTTAACTGAAATTCAAATTATAGGAGTTTTTGAACATAAAAAGCATAT 
ACAACTTGAGTAGCATGTATATATTGCATATAAAGATTCTTTTTTTTTGTAATTGAGTTTGCCAAACATT 




TTAGTCACTCCCAATATATCGTCAACTCGTAAATGTGATAATTCAGGTCAAGTGCCTACCTCTAACGATT 
AGCCAACATTTTTTGAAACAAAAATATATTTCAAAGGAACACAGTGAAAACCTCTCTATGTAGGCTGACA 
GGTGAAAATTATGAATTAATTGCATTGGCCAATGACAAATGAATAGACAAAACAGCAAATAAGGTTGCAA 
AAGTAGCCCAAACAAACTAGATTTCGGTTACGAATTTTCCATCTTTCAAAACAATGAATTTGTTTAGAGC 
TCTGTGCCATTTATTGCAACTAAAATGAATATGCAATTAAACAATCAGAGATGTATTGGATTATCCCCGT 
GGTATACTTTTGAGTTCACCATTTGTTTTTTTTTTGGGGTTAAATTAGTGCTCCTACTAAAAATCGCATT 
TATCTTACACTCACCATTTTGATAAGTTATCTCTGGTCAATCGCAAATACTATGCTTCTAATTAAGAGTT 
CTATGTAAATCCCATTTATTTTGATCAATCTATTGGTTTGAAGTAAGAGTTGATTTTCTGTAAAGATTTA 
TTTGACAGTGTAGTTCGGTGTCAAAAATATATTATGATGTACACTAAAAAACACTAAATTTCAAGTCAAT 
GGGGAACACAAAACTGAATTAATTACTATATGTTGGTTTGTGCACTATTTTGTGTCAGAAACTGATCAAT 
GAAAATGATGGTTATTATGAGAATGGAAAATTTTTCCATCACACATCAGGTGATGACAGAACTAAACTAT 
ATTGTGTAGTATAAATAAGGGTATGAAATACCAACATCCCAGAATATCAACGAGATAGAAGAGAGGAGTT 
TCAATATATATCTTGTGAATAATAACTTCGTTCTAATTCACTATACACAACTAGACGTGTACACGCTCAA 
TCTCAGGTAAAGAAAGTTTATATTCCATCACTATATAACAACAATCAGGCTTTGCAAAAAAACATTTAAA 
ACTAATACTGGTAATATGGAAATATAACGCCTCGTAGTTCTACGCACGTGGCATCCTTTATCTATTTATT 
CAATTTACCCCTAATTTATGAATTAGCTTAATAAGAGCAGTCAAATTAACACGGCTCAATTAATAGTACT 
TAATAATATGAAGCCGATCAATTAACCGATCCTTTGAATAATTTGAAAATAAAATAAAGTAATATAAATA 
GGTATGCATTTTCCCTACATTTATTTCCTCTTTCTATTTTAATTTGTTTCCTAAACAGCAACAACAACAA 
TTGAAATTCAAAA 

>retrotransposon_10 879bp public: 1..879; Tca2-like LTR: 326.. 605 

GGCTCGTAGATTCGGTATACTTGTCTAGAATAAAAATGAAAATGAATGTTAGTTGAAATGTCAGGTGGTG 

GTGGTGGTTTTTTTTTAGATTTCAAAAACTATACATACTCCTATGAGATCAATTTTCTTGATTGAATATC 

TTGGTAAAATGGTTATGAGTTCATTTTCTGCCAAAAAGGTAATTTCTGATGGCATAAGATTCCCTTGAAG 

GTTTTTTGGGAGTACCATGACGGGTTAAGGATTATTTGTTAATGGTTAAAACTAGATAGTAGTAGTCTAT 

ATTTAATTTATTTTTTTTTTTTTGACACCTTGTGCGAAAGATCTCTGTTGGTTTGTACACTATTTTGTGT 

CAGAAACTGATCAATGAAAATGATGGTTATTATGAGAATGGAAAATTTTTCCATCACACATCAGGTGATG 

ACAGAACTAAACTATATTGTGTAGTATAAATAAGGGTATGAAATACCAACATCCCAGAATATCAACTATA 

TAGAAGGGAGGAGTTTCAATATATATCTTGTGAATAATAACTTCGTTCTAATTCACTATACACAACTAGA 

CGTGTACACGCTCAATCTCAGGTAAAGAAAGTTTATATTCCATCAATCTCTCTCGATGTTGTAAAGAGAC 

GCGTCAATTAACAATAAACTCTAATTTTGTTTTTCTTCTACAAAACTACCAAACATAATCATGTCAAGGT 

AAATTACAATGATATTTAATTACGTAAATACTTCTATACCCTTATTGATATTCAATCATTTTCTTCTTAT 

ACGTGGAAGTTCTTCCAGATGTCATGGCCTTGGCCCTTCTAGCAGGTTTTGGACCGTCACTATCTCTACT 

ATACGGGTCAAATCCACGTCTCTGTCTACCATTAGTCTA 

>retrotransposon_ll 974bp Incyte: 1..974; CTA2 (transcription factor): 
join(<974. ,>778,<223. .>1) , Tca2- like LTR: 483. .761 

ACCCGTCTAGTATCAGCTCGTCGTTTTCAAGTATGTTGTTCATGTCCAGGTTGTTGTCTGTGGTGGCAGG 

TACTTTGTCGTCCAATTTTAGGTCCTCGTAGTCCATGTTGGACAACATGTCTTCGTCGGTATTGCCGTTG 

ATGTCAAAGCCAATAAAGTCGTCAAAGTTGTCAAACTTTTGTGGGGCGGTCTCTGCTTTCTTTCTGGCCT 

CTGCTTTCTGTTTGTTTTACACTTTTCGTCTTTAATTATAGTTTCGAAGAATTTCCTAGGAACTTAAGAA 

TTTGTAGGAGAATGCTAATAAGAAGTTGTATTTCTTAATTGAAAGTTATAATTGTAAGAATATATTGTAT 

AAAAGATGAGTTGATAAAGAAAAGATATAAAAAGTCCTATAAAAAAGTATTGTAAAATAAAAGTATATAA 

AAATCAAGTAAAATAGAATATTTGCACACAAATTAAAAGTAGTGCAAATTTGACAGAAAAGTTGTTGGTT 

TGTGCACTATTTTGTGTCAGAAACTGATCTATGAAAATGATGGTTATTATGAGAATGAAAAATTTTTCTT 

TCACACATCAGGTGATGACAGAACTAAACTATATTGTGTAGTATAAATAAGGGATGAAATACCAACATCC 

CAGAATATCAACTATATAGAAGGCAGGAGTTTCAATATATATCTTGTGAATAATAACTTCGTTCTAATTC 

ACTATACACAACTAGGCGTGTACACGCTCAATCTCAGGTAAAGAAAGTTTATATTCCATCAAAAGTAAAA 

TAAAACACTTCTTCGCTTCCTCTGCTTTCTTGGCTTGCTCTGCCTTCTTGGCCTCTTCTTCCTTCTTTCT 

TGCCGCTTCTTCTTTGACTTTCAATTCGTCAAGTTTCTTTTTCTTTTCAACCATAACGCCGAGACACCAC 

TCTGCATCATTGAGTTTCGACACTGTTTGGTCTAGAATAGCATGGAAGTTTTGGATTTCGCCGT 

>retrotransposon__12 3868bp Incyte: 1..1295, public: 1296.. 3868; Tca2-like 

LTR: 127.. 407 

AATGAAGTAACTTTTTTCAAGGCAACATCTATTCTTTTATTAATCTCGACGTCTGTTTGATTAAGTTGCT 
CTAACATTTTATTTAGATCCTTCTCTATATTTTCTGCAATATCAAACACCGATTGCTTTTTGTCTGAAGT 
TGCTGGTATATCACCACTTCCGCCAATTGTCGTATTTCCACTGTCCTTTGTTACTGACAGATTGGCACTG 
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ACATTACCTGAATTGTTCATGTTTGCTGTTGAAAGAGCAGGAACTGTACTTGGATAAGCAGCCGATTCAA 
AAGAAGATGTGGACATGAGTGTCAAGAAAATGTGTAGAATCAGTACAAGACTGGAAAACAGAAGGAACAA 
AGTGAACTGGATATTGTAGTTTTGTTGATAGTACTCGCGAGCTTTAATTTTTTTTTGTAACTGGCGGAAT 
CAGATCTTATGCAATACTCAAATCCAAAGAAACAGTCAATCCAGATGAAAGGCATGTAATCGCTAGTTTT 
CATAAACAGAATCATGTTACTAGTCATATTTTCTATAAAAATTCAATACTTCATTCTTTTTGTTCAATAC 
TAACTATAAATGCTTACAAATAGATTCAAATTTCAACCAGATCCACCACTTCATTAGGCTCAACCAATTC 
TTCATAAATAGAAACGTCTTCCTCAGCCAAGCTTAATTGATGGGAAACCCTAGCTTGCATTGAAGGAAAA 
ATACATAATCCAAATAANCAACTGTCTTTCCAAATATTCTC^AAATTCAACTTCACCGTCTTTCACCAAG 
CAGGATCTCGTGATTGGACCAATTCTAATTCAGAAGTTCTTCTCACACAAGTCCGAACGACTCGATCCAT 
CATAATGGATACATCGTTCACGTTGCCACCAAATCGAATGACTCTGTTTGCACCTGTACAAAGTAGAACA 
TATGCATGGAAAAGTAAAACTAGTAAAACCGCATAATGAAACCAATAATTCATCATATGTTGATTGAGTC 
TGAACCCCATCAAATATAAAACAAAAGTGAGTTTAACCATAGTTATAAGAAGCAGTCTTCCGTTGGTGTA 
TAATCTATCCATAAGATCGTCAATTTCAGCATCTTCAACATCAATGTTATTAGCGTCACCTGGAACGGCT 
TGTTCATTAGATTCTGATTCCAGGTCACTACCAATATCATACATCATTACTAGTACTTTTTGAATCAATG 
GCTCACCAGAAGCCAGTTTAAACACCTTGTGAACTTTTGCTGCACCCATAGGACCGAGTAGTAGATAAGG 
ATCGTGCAAGCCGTTATCCACAACAATGCATTGTGCTGTACCCAAGCTTACTTTCTTCACAATATTGTCT 
ACTTTCAAAGTAAGTTCATACTCAACATTAGACAAGTCATCCTGTTTCACTAGAATTTTTTTCCCTGAAT 
GCTGTTCAACCATAGTATCGTACGATGTTCCCTCCATTTCCCATGTGGATCCACCACGTACCTGAATACT 
GGCAGGTTTAATGGGGTCTATGTTAGGAGTTGAAGACTCTGATGGATTATTGACAAATGGAATAGAGTCT 
TGTTGACTTGGCACCAGCGTTTCATAATTTGAAGGTGAAGGTACTGGGTTAGCCGAGGTTGGTGATGTTG 
AAATATCACTATCAATTCCTTGTTCTGAGGATGAGCTAGTAGCAGTTGGATTTGTTGTGCTTCTTGCAGC 
AGACAAATCTGATGTTGATTCTAATGGCACTGAATTCGACAGCGCCAAATTGGGTTGCTGTAAAGAGTCA 
TTGGTGGCAGGGAGAAATCTAAATCTATCATTTGACTGAAAGTCCTTCCAAAATTCTCTGCTCAACAACC 
CACCAGTTCCATTTACATGTTCATGCTTTGTAAGTTTCAATTTTATGACACTGTTATTCTGTTCCAAAAG 
CTCTTGATTCAATCCCAACAATTCATAAACACTAGCTTCCTCTTCTTGAAATGAGGTTGGTATTATATTC 
CCTTCGTATGATAGTTTTATTTGTTCTATAAATGTACGTGTGACAGAACCTTCGTCATTCTTAGCTATTA 
TTAATTGCTTGAGTTGCTTAACCGTAGTTCGGTCATTTATTTCAATCATTGACTTTTCATTCTGTAAATT 
AGGAAGATTTGACTCCAACAAAACCCGGAATCTTTTGAAATTACTATTCATTTCTAAAGGTTTGGGTTGT 
GTGATTGAAGCTAATGGTGTGTGTACTAAGTGGTTTTTCAATTATAAATATTGATGAACTACACTATATA 
TACACTGAGAAAAACACGACCAAAATTGACACCGCACTAAAAACACGGAATTACCGTATTCTTTTTGTTA 
ACGATTTTGTTTCATTACACGACTGTCGTTATACACACATTTAGAGCAAATTATTTTAGATTGATCAGTG 
TTAGCAACTGGCTATCGATAATAGAGTACCTTCCCGAGTTAGAATGTCTTATTAGAACAACAATTGTTTC 
ATATAAATTTGTCGCAAAGCACACGTAATATACTATATGGAAGGGGCTAAGTAAAAATGTCCCGTTTCTT 
CTTAATATGAGAACTCGTGTACGACACAATTTGCTGTGTTGTTAATCGAGTATGCTACAACCTGAAAATG 
GACCATAGACCCAAACTACTTCTCTCTTTCTAGCACCACAAACCCCACAATTAGCACAACAATGAATTGG 
ACTTCACTTGTATATCTATGGTTCATTTTCAAAAGCATATTTGCTGACTTAACATCACACCAACTCAAGA 
GCAAAGTGGTATTCCTAGATACTACTATCCTGGATGAAGTGGCCCGAAGCTATTTGGGATCAGAGGACGG 
AAATGTTACACATGGTAATTATGAAATATTGTCAATTGCAAATGGGCGCCAATGACGGAAACATCACATC 
ATATTTATGCCAGTTGCCAAGAACCAAAAAAATGGCACCAACAAAACCCAAGCCCACCATGTCAGTTCAT 
GAATTGAAATCGCGAGCTATTGACTTGATATCGGAATCCTTTGTCGAAGGTACCAGTTGCGTATTTTCTT 
TCAACTTGCATGCAAATTATTGGACTATAGGCTATTGCCATGGAATCAACGTTATTCAATTCCATGAGAA 
TTTGGATGATTTTATAAGCGGAATTCATAAACCCCATTCTCCAAATCATGTATATACATTAGGCAATTTC 
CTGAAGCAAACACTGCCATTAGAATTCGAGTTTGATACTAAAGAACGCACAATAAGTCAAAGATTGTTAG 
GAGAAGTTTGTGATTTGACAGGAGAACCACGTACCATTGACACCATTTATAGATGTGACCATATACTTGA 
AATTGTTGAATTAACAGAGATAAGAACATGTCAATATGAGTTACACATAAACGTTCCTAAGTTGTGCCTG 
TTGCCGGAATTTAAAAGGACTAACCTTGAAGAAGGTGTCTCAGAAATACTCTGTACAAGAATTGAATAAG 
CATTAAATTTAATAAAAAACATCAAAAAGTGTATGTCAAAGTATTTTTACCTTTGTAATTAGTAGTTTGT 
CAGTTTCTATATAAACATAGGGTAGTTCGTATATACGATATCGGAGCGATTCTAAATAAGTCGTGGAAAT 
TGGCCGACAATGGGATTTGAATTTTACTTGTGTGTGTGTGTGTGATCTGAATAATAGTAGTGCTAAACAA 
CTTAAATTAAAGAAAAAAAGACAAAACAAAAAAAATTAAATCTGCTTATTGAAAATTTTTCGAAATAGGC 
TAACCCGTGTTTATTAGATATTAGATAGTACGATTTGTTCAAGTGTCAAAGATAGCAAATTTTTATTGTT 
TCTTCTTTTTTATATACAGCTTGTTTTAATTTCAGGATCATTTTACACTAACCTACTCATCAGCCTATTT 
TAATTTATCCTTTTGGCT 



>retrotransposon_13 469bp Incyte: 1..301, public: 302.. 469; Tca2-like LTR 
75. .355 

TAACGAATGAATATAAAATACTTGTATTATGTAGTGCCAATAAAAGTTGAAACGGTCGCACTACTTTTTA 
GTCCTGTTGGTTTGTGCACTATTTTGTGTCAGAAACTGATCTATGAAAATGATGGTTATTATGAGAATGG 
AAAACTTTTCCATCACACATCAGGTGATGACAGAACTAAACTATATTGTATAGTATAAATAAGGGTATGA 
AATACCAACATCCCAGAATATTAATTATATAGAAGGGAAGGAGTTTTAATATATATCTTGTGAATAACAA 
CTTCGGTCTAATTCACTATACACAACTAGGCGTGTACACGCTCAATCTCAAGTAAAGAAAGTTTATATTC 
CATCAAGTCCCATCTGTTAAATATTTTTGTATCTTTTTATTTTTATTTTTTTTTCTTTTAATTTCATTTA 
CATACATTAACACATCTACTAACCATATATCACGAGATACAAAGGCAAG 

>retrotransposon_14 (direct) 4545bp Incyte: 1..4545; Tca3 LTR: 1..314, 

4234.. 4545, POL fragment 1: 577..>3324, POL fragment 2: <3443..4201 

TGACGATCCTGTATATTTCGTCATAATTCACACATTCTTAAAATTATGCACACATCCTTGAAATGTGTTA 

ATATTCCCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAAGTTATCAACTCAATTCACGCTATA 

TAAACCTTACAAATTCTCTACATTTTTATATTTTTTTATATTGGCTTTTCTTTTAGAATCAATCAATACT 

TTTTTTATCATTTAGATACATCTTTCATCTATTAATAGATTATCTTTCTATATATCAAAACACGACACAG 

TCACGTGCCAAAAAGGATATAAGAAGGAACTTCAGAAAATTAATTTTCTGATTATACTACTTACTAGATT 

GCATAAAGTCAATATCTGATTGATACAACTTGGTTCATTATTCATAAAACTTAACAACTAATTCAACAAG 

GAAACCCAACAAAAAAATCCAAATAAAATAATCAGGAAAATATTATAATTAATTAATTACAAAAAAAAAC 

AAAAAAATACACACACACATACACACACACAAAATCTTGTTGCAAAAAAAAAAAAATAATAATAATATAA 

TAAGAATTAATTAACAATGTCGTTTC C ACGGACACATTCAC CAAGACCATCTGGTTCACGAGAACAGGAA 

GATCTCACACTGATGATTAAAGCTTTTAGAGATTCAATGGAAGCTAAGCTTGACTTGCATTCGCAGAAGC 

TTACTGCTTTGGTAGCAAACATTCCCAGAACGGACGAAGGGTTTGAAGATTTATCACAAAGGATCACTGT 

TCTTAAAAATCATCAAAAAGCATTTTTGCCCAAACAAGAAAAAGAAATCGGAAGTCTTCTCCACAGACAA 

AGAGAGGAAGAAGGTGATATTAAGGATTTCAAAACAGTCGTTGGTGAAGAAAAAGAAGAATTGCACCAGG 

TTGAAGATTTCGTTTTAAAAGATCAAGAAGAATTACGAAACGTCGAAAAGAAAGTTTTGAAAGAAGAAGA 

AGAATTGCAAAAAGTGGAAGAGTCAATGGAAAAGGAAAAACAAGAGTTATACCAGGTTGAAGACTTTATT 

TTGCAAAGAGATGAGACGGTAAAGAAACTTGGAGAAAGCAATCAATCTCAACAGGAACCATATACACCTG 

CAACTTCTGGTTCGGATCAGAGATTCAGATCTCAACAACCTAACATTGGAAATACCTTAGCGCAGGATCT 

AGCATTAATTCCAAAATTAGATCTGGAAATTTGCT^AAATTGCAGTCAAATATCCAAAATTATTTGAAACA 

AAATTAAGACC^CCACCACCC^GAGACTTTCAATATAAAATTCAACTCACAGACCACACTCAAATTTATT 

CAAAACCATATAAATGCAATCAAGAAGAACAAGCTCTCATTAAGGATTTCATCAATGAAAAATTAGAAGC 

AGGCGTTTTGGTACCAGCTCCAATTGATGCTTGGTTACACCCAATATTTCCAATCAGAAAAACCAATGCC 

AACCAATCCTCCACCAAAATAGCAGTTGATTTAAGACGTCTCAATAAGGTCACAGTACGAATGTACACTT 

ATCCAACAGACACAAAAGACCTCTTATCCTCACTAACAGATTCCCACTATTTTAGCGCTTTAGACTTAAA 

GAATGCGTTCTATCAGGTAAGCATACACAAGGATAGTATAAAATATTTTGGGATTTCAACATCCGAGGGG 

AATTATTGCTTTACAACTTTACCGTTTGGAGCAATCAATTCCCCAACCATCTTTACTAACTTTGTGAGAC 

AGATTTTAGAGGGGATCCCATGTATATTTATATACATGGATGATATCCTCATCCATACTAAAACCTTACA 

TGACCACATGTCATTACTCAGGAGAATCATGGAGAAACTAAATGAGCATCAGTTTCAAATGAATTATAAC 

AAGATGCAATTATTAACAACAAAAATCAATTTCTTAGGGTACAGCATTCAAGCGAACAAAATATCACCAG 

ATATTTCCAAAATTCAAGCAATACAAAATTGGGAATTGCCCACGACCACTACTCAAATCAGAGCATTTGT 

CAATTTCAGCAAC CACTTTCGCATCTTCATC CCAGAAATAGCAAAATTTACTAATCCATTAAATGAATTA 

TTGAAGAACAACAATGGTAAAAACATAAAGATTGAACACACCCAAGCATCCATTGATGGTTACAAGGCAT 

TAAAAGCCGCCATCATTGGATTGCCGACGCTTCAACTTTACAATCCAAAACTACCAACCATCATTTTCAC 

AGATGCTAGCCACATGGTAGTAGGAGGATATTTATGTCAACCAACATTCAGAAATGACAAAGAAGTCCTT 

GTCCCAATTGCATTTTCATCACATAAATTAACAGAAACACAAAGCAGATATGCTGCTATGGAAAAGGAAC 

TTTTGGCAATTATTGTGATATTGGAAAAATTTAGATATCACTGCAGCAATACGGTAGAGATCTATACAGA 

TTATCAAAGTTTGGCATCATATTTAGATAAGAAAACTACTCCACCACCGAGAATTGCTAGGTTTTTAGAT 

CTAATTGGATCATTTTCCCCAAAAGTGTACTATTTAAGTGGAAAGAAAAATTTCGTTGCTGATATCATTA 

CAAGATATCAAACTCAAAATATTAAGGAATTGGTAGATGAAGACAAGATACTAGGACAGACTTTTACAGT 

CAAGAGAAATTTGAAACAACAACTATTACCAAGATTGGAAGCAATTGAATTGGAAAATCTTAATGAATCA 

CAGGTTCACAAAATCCAAACTTCATTAGAACAAC^ACAACAACATGATTTGGAAGACAATGATGAAGAGT 

TACCTCTCCAACTGTTTAAATTAATGAATGATGAGTTATTTGTAATCATTAACAACCAACTTTTAAAATA 

CCTTCCAAGACTGGAATACAATGATATTTGTCAAACAATCCATGACAAACACCATCCATCAACTAGAGTA 

ACAGACTACTTATGCACACTCGCATATTGGCATCCTGACCATCTATTAATTGCTACAAACATTACGAGAA 



AGTGTCACTATTGTCAACTAAACACGTCAATTCGTGAGGCCATTAGACCATACCGACCACTTGAACCACT 

CAAGGCATTTAGCAGATGGGGAATGGACTACTCTGGACCATACTTTAACACAGTCCAACACAGGTACATA 

TTAGTAGCCGTGGAATATGTCACTGGTTTAACTATTGCAGTACCAACATTGCACAAAGACGCAGATAACG 

CAATCAGTCTTTTACAATC7VATCATTCTGATCATGTCAGCACCTACAGAATTAGTTACAGATCAAGGTAA 

AAAAATTTTCATCACAAGCTTTGGCTACCCTATGTGACCAGAATAACATACAACACCATATTACCTCCGC 

CCACCACCCACGTGGGAATGGTCGGGTTGAGAAGGTGAACCACCTATTGAAGAAAATATTGAAAGCATTA 

ACTAACGATACGATGCAAGACTGGGATTTAAAACTATATGACGCTTTAAGAATCTACAATGCTACACCTA 

CAATTTTTAACTACACTCCACTTTATCTTGCACTTGGAATTGAACCACACCATAATTTAAATCAATTACA 

AAAAGATTTAATTGAAAATTTGCAAAAAGAATTGCCCCCAGAGGTCCAATCCACAGAAGAACACGAAGAA 

AACCCAAATGATGAACAACAAGAAGAGGGCAGAGAACAACAAATTTCAAGAGAAGAACAACAGGACGGCA 

GAGATCTTGTACACTTAAGAATTTACGAATTGGAAGCAATTAAGAAAGCTCGCAAGTTACACACAAATTT 

GAAAACACGAAGAAACGCAGTCO\AAATATGTTAAAGGAACCATATGGCATTCCAGCACCTTTTACAAAA 

GGACAATGGGTATACAGAATTAGAGCTAAAGCACGAAAATATGAACCAAATTTCGATGGTCCATATCAAG 

TTCAAGAAGTATTAGGTAAAGGTGCTTATAAATTGAGAGACATCACTGGAAGAGAAAAAGGAATCTACAA 

TCAGGATCAATTGAAGTTAGCATATTCAGCAGACAACGACCCAATACAGGTTTTTAGTTCTTTCAATAAA 

GAATATGATCGAGTACAACAAAAATTGTTAGACAAAATTCAATCGGAAAGAGATCATCAATTAAATTGTT 

TGTCAGTCCAACATTTACACAGACAAAGAAGGTTACTCGATATATCCAGCTGTCTTGAGCAAATTCTGCA 

ATAATTTCGCTAATCATTGGAGGAAAGGGTAGATGACGATCCTGCATATTTCGTCATAATTCACACATTC 

TTAAAATTATGCACACATCCTTGAAATGTGTTAATATTCCCAACATTATCAATTATATGTGTTCAGAATT 

GGTTGCAAAGTTATCAACTCAATTCACGCTATATAAACCTTACAATTTCTCTACATTTTATATTTTTTTA 

TATTGGCTTTTCTTTTAGAATCAATCAATACTTTTTTATCATTTAGATACATCTTTCATCTATTAATAGA 

TTATCTTTCTATATATCAAAACACGACACAGTCACGTGCCAAAAAGGATATAAGAAGGAACTTCA 

>retrotransposon_14 POL fragment 1 916aa 

MSFPRTHSPRPSGSREQEDLTSMIKAFRDSMEAKLDLHSQKLTALVANIPRTDEGFEDLSQRITVLKNHQ 
KAFLPKQEKEIGSLLHRQREEEGDIKDFKTVVGEEKEELHQVEDFVLKDQ 

EESMEKEKQELYQVEDFILQRDETVKKLGESNQSQQEPYTPATSGSDQRFRSQQPNIGNTLAQDLALIPK 
LDSEI CKI AVKYPKLFETKLRPP PPRDFQYKI QLTDHTQ I YS KPYKCNQEEQAL I KDF INEKLEAGVLVP 
APIDAWLHPIFPIRKTNANQSSTKIAVDLRRLNKVTVRMYTYPTDTKDLLSSLTDSHYFSALDLKNAF 
VSIHKDSIKYFGISTSEGNYCFTTLPFGAINSPTIFTNFVRQILEGIPCIFIYMDDILIHTKTLHDHMSL 
LRRIMEKLNEHQFQMJSTYISFKMQLLTTKINFLGYS IQANKI S PD I SKIQAI QNWELPTTTTQ IRAFVNFSNH 
FRIFIPEIAKFTNPLNELLKN1OTGKNIKIEHTQASIDGYKALKAAIIG 

WGGYLCQPTFRM^KEVLVPIAFSSHKLTETQSRYAAMEKELLAIIVILEKFRYHCSNTVEIYTDYQSLA 

SYLDKKTTPPPRIARFLDLIGSFSPK\mTLSGKKNFVADIITRYQTQNIKELVDEDKILGQTFTVKRNL 

QQLLPRLEAIELENLNESQVHKIQTSLEQQQQHDLEDNDEELPLQSFK^^ 

YNDICQTIHDKHHPSTRVTDYLCTLAYWHPDHLLIATNITRKCHYCQLNTSIREAIRPYRPLEPLKAFSR 
WGMDYSGPYFNTVQHRYI LVAVE YVTGLTI AVPTLHKDADNAI SLLQS IIS IMS APTELVTDQGKKI FIT 
SFGYPM 

>retrotransposon_14 POL fragment 2 253aa 

MQDWDLKLYDALRIYNATPTIFNYTPLYLALGIEPHHNLNQLQKDLIENLQKELPPEVQSTEE 
EQQEEGREQQI SREEQQDGRDLVHLRI YELEAIKKARKLHTNLKTRRNAVQISnyELKEPYGI PAPFTKGQWV 
YR IRAKARKYE PNFDGP YQVQEVLGKGAYKLRD I TGREKG I YWQDQLKLAYS ADNDP I QVFS S FNKE YDR 
VQQKLLDKIQSERDHQLNCLSVQHLHRQRRLLDISSCLEQISQ 

>retrotransposon_15 2093bp Incyte: 1..2093; Tca3-like LTR : 1509.. 1822 

TTTTCCCACAAATAATATCAACAATATTTCATATTTTCCATCATGCTAGAGAAGATCAAGTTATAACTAC 

ATTAATTGGTTATGTTTATAAATTGACTCAAATTTGTTTAAAATTTGAATTACATTCTGAAATTAGAAAA 

ATCATTGATAAATTAATTAAATTTACTACTTTAACTCACACACCTAAAAACCTTAATGAAATTTTAATTA 

CTGAAGTCAAATTAGATAATAAAACCGAAATTTATGTTAGTGATTATGCTTGTTCATTTGGTCGTGATTT 

TAAAGCTCAATTATGAACGGTGGTTTTATTTAAAATAATCAAGAAAAATAATCTTAAATTGAAAAATTGG 

GATAAAATTGTGGAAATTATTGAAAAATTATATCAATATTCATTGATTATTGATGAGAAGGATACTACTA 

CTACTACTACTACCAATGATAATAAGGAAGGTGATGATGAAAAGGATAATAAGGAAGCCACTGTTGAGAC 

TGACAACTCAATATTGAAATTATTGCCTTCAAAAGATATTAAAAAATTCCCTATTAAAAGAATAACTAAT 

GATCTGTTTCTTTCAATATTGAAAAATTTAATTGATAATCAACCTACTGAAGAAGAAATTCAATCAACTT 

TAGCAGCTATGGATTGTATTAAATCATTAGATATCTTGAATGTATTAAGAATTGTTGCTGAATCCAAGAA 

ACAAGCTAACTAAATCTAAACAATCTAAACATCTAAACATCTAAATATATATATATATCTATTGTATTAT 



TATATTTGTAAAATTTTGTAGTTTGCAGTGGTTGGAATAAATGATAGGAGGATGTTCCATTTGTGATACA 

CTATTTCTACAAACTGTCAAATTCAATAATCAAACTTGTTGCCAAGAAAAGATAACAAAGAAGGCTATTT 

GGTTTACAAGGTACAACAAGAACATGGGTATATCACCACGATAGTTTAGTAATTTTGTAAATCTTCTTTC 

TCTGTTTTACTTAGCCTCATTTAGTCCTTTCTTTCAGTTCCAAAGTAGGATGTGCAACATGGCCAATTAT 

CAACAATAAGCTAGCATTGCATAATGGTAGTGATTGTACTGAAGAGAACAATACACTAATCTATTCCATT 

GACGACGGAATAAGTGGACTGATAATTCACATGGATAATTCAGTCCACTCTGAGAGGAATTTCCTCTTTA 

TATAATAGAAAATTCCTCAAGGTATTAGATTGTATATTTTCTATAGATAACTAACCTTGAACACAAGAAT 

ACTATCGCCTTTCGTTGCAGATTATCGCTCAAAACTTTTCAATAACTTTTGGGTCTTTTTTTAACAATAA 

CCAATAAATCATTACAAAGAATTACAAAAAGGGCTATAATGACAAATTTCACATAGATAAGAAATATAGG 

TTTTATTACTTTTTGCATAATTGCTGACTTCTATTTTTGGTTTGGAGATATTTAGAACGTTTGATTGTGG 

GGGTATTACTTCCAAAAAAAACAAAAATTTGTAAACCCTGACGATCCTGTATATTTCGTCATAATTCACA 

CATTCTTAAAATTATGCACACATCCTTGAAATGTGTTAATATTCCCAACATTATCAATTATATGTGTTCA 

GAATTGGTTGCAAAGTTATCAACTCAATTCACGCTATATAAACCTTACAAATTCTCTACATTTTTATATT 

TTTTTATATTGGCTTTTCTTTTAGAATCAATCAATACTTTTTTTATCATTTAGATACATCTTTCATCTAT 

TAATAGATTATCTTTCTATATATCAAAACACGACACAGTCACGTGCCAAAAAGGATATAAGAAGGAACTT 

CACCCCCTTGCTCTTCTTATTATTGTGTGTGGTGTAAGTTCAGCGGGTAGTCCTACCTGATTTGAGGTCA 

AAGTTTGAAGATATACGTGGTGGACGTTACCGCCGCAAGCAATGTTTTTGGTTAGACCTAAGCCATTGTC 

AAAGCGATCCCGCCTTACCACTACCGTCTTTCAAGCAAACCCAAGTCGTATTGCTCAACACCAAACCCAG 

CGGTTTGAGGGAGAAACGACGCTCAAACAGGCATGCCCTCCGGAATACCAGAGGGCGCAATGT 

>retrotransposon_16 2099bp public: 1..2099; Tca3-like LTR: 1565.. 1878 

ACATTTTTCAATATTGAAAGATAAATATAGCATTCCAAAAAAAAAAGTGACTTCTGTGTTCACATTTAAT 

CAACAAATTCCCACAACAGCTTGCACAAACTGCTATCTACTAGGCTTACGAGACACAAGTGTTACCAAAT 

AGTGATACACTTATACTTTAACTCATAGAAGAGAATTAGATACTCGGAATATTACTCAACATATTCCCAA 

AATAATCGTAAAGATAAATCTTTGAGAGTTAATACTAGAGAGCTCAATTCTAGGCACAAATACCACACTT 

TTTACGAGTAGTGGGTAAGAGTTCGTACACATGATGCAACAACTTTCTAGTACCTACTTGCACAAAGTGT 

AGTTTGCAAAAAACTTTGCTCCTCCATAGCATGTATCTCAATACTCCAGAAAATCCGATAAAGCAACTCT 

CCGATGGTCATGCAAGTATTCGCCTTTCTCTTTTGTAGATTTATGTAGTTTCAAGATGACACTGAACTCC 

TGAGTATTAAAGTAGATTAATAATAGAAGGTATTGCCTAATGCCGAGAAAGTAAACACCAGATCAAATAT 

ATGCTTTACTATGAAACTTGTTTGTGTTGTGTGGATTGGCCAAACAAAGATCATGCTGATATCTGTAAAT 

CTCTGGAACGGGGGATAGGAATAAACTTGAAACAATATAAACGAGGTGTTTTCCTTTTCTGGTGCTTGAT 

TTGAAACGTGTACATTCCCTCTTTTTCTCTTAGTTAACAATATTGCATAATAGTGAGGATGTGAGCGTAA 

GACAGAAAGCAGCAGCATGGGAATAGTTCAGCCTATTATTGTCGCAAAGCTGCATATTGCTTCTTCTATT 

AAACTTTTGAATCTTCTCTTTTAAGTAAATTAATTAATAACTTGATTGTTCCATTTACATCCATTTTCTA 

TTTCTGTGTAATCTTCGTTTATTTTGCGGTTTGAATACTTCCAAATTTAATTAAATTTGTTCCTAAAATA 

GAAGCTGTTATACTTGCGCCGCCAAACCCATTTTAATAGTGATCCTTATTTCAATTTAATTTGTTCACGT 

TATATCTCTGAATTTGATTAATACTTGCTACAGATATTTGGAAATCATAATTTATGATTTCTCCGGAATG 

TAACTGAGTGGCCAGAAGATATATAGTAACACATAAATACGTACACAACACCAGAACAACCGCAACATTC 

AAGTGGAACTAGTATGTGTTGAAAAAACAGACAAATTAATCGGGATAGGAAGAGATGGGAAAGGGGGGTG 

AGAGAAAAGCAAAGAAAAAAAAAAAAGAAAAAAAAGAACAAAAATCAAATGGTACAAAAAAAAAGACACA 

TCTTCTACACAATTAACAAAAACTGCCTTCTGATGGCAAGAAATCTACCTCACATACATACTTAAATGGA 

ATAAAGAAAGTAATCTATAAAAATAATTTAACATGACTAACGTATTTCAAGTAAAAAGGTCAAAATTAGA 

GAACCCACCACAATCAACTATTTTCTACTCTCAATTGTTTTTTCTTTTTAGTTCTTATAATTATCT^ACAT 

TTTCCTTACTCAAATCTTTCACCTTGACGATCCTGCATATTTCGTCATAATTCACACATTCTTAAAATTA 

TTCACACATCCTTGAAATGTGTTAATATTCCCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAA 

GTTATCAACTCAATTCACGCTATATAAACCTTACAATTTCTCTACATTTTTATATTTTTTTATATTGGCT 

TTTCTTTTAGAATCAATCAATACTTTTTTTATCATTTAGATACATCTTTCATCTATTAATAGATTATCTT 

TCTATATATCAAAACACGACACAGTCACGTGCCAAAAAGGATATAAGAAGGAACTTCAACCTGTTCTTTT 

CTTTTTTATTTTTAAATTTGATTATTATTAATTTTTTTTTCCTTTCTTTCCTTACCAATTTTTCTTTGCT 

TGACTTATTCAAAAGGTGAAACAGGGATTTTCCAATTCACATAGCCAAAAGTATTTTTGGTTTCCACATT 

CCTTCAAAACAATATTTGTGCTACCTCCCCCTTCCCACCAAAAGTATCCGATTCCAACCATAAAGCAGC 

>retrotransposon_17 3284bp Incyte: 1..2749, public: 2750.. 3284; Tca3-like 

LTR: 2750.. 3063 

TAATAAGTACCAACTAAATCAAAACAAGCGACCAAATTGAATAATAGGAAGACAAAAAAAAAAGAGAGAA 
AACAGTACCAAAATAGATATAGTATGTAGTTACATTTACTCAACATAGTTATTAGGTACAAATCCAATTC 



TGTAGCTCTCATCATCAATTCTTGAGACTCCAATCAACCAATTTAACTCATCTGAATGATACAATGTATC 

AATATTCTGAAAATCTAATAAAATTTCAATATTATCGCCCTGTTTAAATGACAAATCACCTGGTTCATAA 

CCACTAAAATCGTATTTTGCAGTTTTCAAAACTTTATTATCGGTGTTAATGTTCAACTTTTCAAAAAAGC 

TTTGTATCAAATTCAACTTGTAAGTCAAACTCATAGGCTTTTCAAACGTAAAAGGTTCATACTGGATTGG 

CTTGGTTGTGATTGGGCTTTCCTTAATCTCATTCTTACTGCCATTGTATATCCTTCTTAATTTAGCTTCG 

GATGAATCATGGTTTGAGTACGAAACACTTGACATGGAGCTAATTGATGAAGCTTCTGACATAATAGTTG 

CGCTCTCGTCTTCAAAATCTGATAGCAGTATAGAATCCATAGAATCTGTAGAAATAGAATATAACCGTGA 

GGCACCTGCAGAAGACATTGGCGAGACAAGAACAGAATGCCTCATAATAGCAGTGTTTGACCTAGGTGGC 

AATTCAGGACCATCTTTCTTCGGCACTGCTGGTACCTTTATATCTTCCTCATCGACTAATTTCCGTGGAT 

GATATGTTTCCGATGGGTTCATCGATGGATCTTGGTACTGTTTGTATGCCACCAAGGGATCGATTTCTAA 

AGTATCATTGAATATGCCATTTACCTTGTCTTTTGTATTCACAACATGTTTCTTTTCAACAAATTTATTA 

CTCATATTACGCCAAAATCTGTAATAGTTCAGCAGCGAATCTTCATCATTGATCTCCTTATCAAGCAAAT 

CCGGGTGTTTCTCGTGCACAATTGTTAGAAGAGACTCTATCTGCAACCTTGTAGCTGTACTGTTCAGTTC 

CCAATCGTCTATTATTTCAGTATACGATTTTGGTGAATTTTCTTTAATCAATCCATAAAACTCTGTAAAA 

TATTGAAAAGTATCAGTTAGCTTTTTAAACGTCTCCAATTGTTGACATAATATCATCTTGGTAATATTTT 

CAACAAACTCATCAAGAAATGAAACTATGTTAGGCAATAATTCAATACACTTTTTATTCAAGCTGTTGAA 

CGCAGCATCAACTGTCTGATATGTTGTTTCTAATTTCTCAAGTTTGTCATTATCTTTCTCGTCCAATGGA 

ATCGCTTTCTGGTTCAATTTCTCAATTTTGCGATGCAAATGATCCTGTTCTGTTCGTTTCATATTACGCT 

TTTTAATCAATTTCAAAGTTTTCTTCAAGTATTTCTTCATTTCGTCAATTCTATATTTGAGAGATTCGTC 

ATATGCTTCCCAATTATTTTCCAAATCAAATTTTAAGTTCTCCACCGTGATCAAATAATTATTCAACTCT 

TCATTTATAGATTCATTCAAAAATTGCATCTCCTTTGGGTGTACATGTGGGATTTCTTGTGTTGCTTGCC 

ATGAATCAAATTCTTGGTAATACTCGTTGATTTTATCAAAACGCAAAGAGTCTTGACCAATCAAGTTGAT 

AAATCCTTTAATAATTTTAATATTCAGGCCGAGCACATGTGGCAAGAAACTCTTGGACAAATGGTGATTC 

TGCGATGTGATGTACTTCAAACCAGAAACTGATTGTTTGATATCGTGATAATAAATCTCAACAAGTTCAT 

CATCCTTATCGTAATCTCTGGTGTGGAATGTAACTGTGTCTTCAATGTTGTAGGATATATTTTTGAATTC 

TGATTCAGTGTACTTGTAC C CGTCCTTAATATGAGTTC CAATATTAGACGATATCAGAACAATATTATTT 

TTCAATTGATCCACAACCATCGTTGTCTTTTATCTATCAGTAGTAAATTGAAAGGTGGGGGGATAGAAAA 

TGAACTAGAAAAAGAAAGTGATGATTCTAAAAAAAAAATTTCTCAAATACAAATACTAAGATAAGTGTTG 

ATTATATGACAACAGGGTTGGAAAGTCAATTATTAATTAAGGACCATTGTAGTTAAGCTGCGCATAGAAG 

CAGAAATGTGTGCAAGAACAGGAACGGACGGGAAAAATAATAAGCTATTTGAATTAACACGAAATAACGT 

GACCTAAATTAAAATAAGAATAAGGAAAAAAAAAAAAGATAGGCTTTGAATTAATGGTTTAGTCACTTTT 

GAACTGATAATTGTTGATCTTGAACTAGTAATGATTAGTTTAAAAACCCAACAGGAACACTTAGTTTGGA 

AAATATGAGTCTCCATAGATCTTCTCTTTAACTTATGCACGGAGCTTAAAAGTACAGTTAGACTCAAAAA 

CGAATATTTTAGTGCAATCTCTACAGTATTGGGGTCTGCTCACAATCAAGAAGAATAACCATTTAAAGGC 

GCTCTGTTGTAGAAATTGTTTGTCTCTACAAACGACCACGATTAGTAAGAGAGGGGAGGAAAGACAAGAA 

AAAAGGGGGTAATCATGATAATTGCTAAAAAGTTGAATTTTTGTAAAGTCCACCCGAGAGTTGGTAGCTT 

TTTAGATTCTAGATCTAACAGCAGTTCTCTGTACCGTGTCAAAATATCAATTGTGGATCCAATACAGCTA 

TTGTAGTGGTACTTACTGATGACGATCCTGCATATTTCGTCATAATTCACACATTCTTAAAATTATTCAC 

ACATCCTTGAAATGTGTTAATATTCCCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAAGTTAT 

CAACTCAATTCACGCTATATAAACCTTACAATTTCTCTACATTTTTATATTTTTTTATATTGGCTTTTCT 

TTTAGAATCAATCAATACTTTTTTTATCATTTAGATACATCTTTCATCTATTAATAGATTATCTTTCTAT 

ATATCAAAACACGACACAGTCACGTGCCAAAAAGGATATAAGAAGGAACTTCAACCTGTTCTTTTCTTTT 

TTATTTTTAAATTTGATTATTATTAATTTTTTTTTCCTTTCTTTCCTTACCAATTTTTCTTTGCTTGACT 

TATTCAAAAGGTGAAACAGGGATTTTCCAATTCACATAGCCAAAAGTATTTTTGGTTTCCACATTCCTTC 

AAAACAATATTTGTGCTACCTCCCCCTTCCCACCAAAAGTATCCGATTCCAACCATAAAGCAGC 
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AATAATGTCAATTTATTACCAAGTTTCCAAAGTTGTCTTGTTGGTAGATTATATTGTTTACAGATTATGG 

TACGTTAT7^AAGGTACTAATAATGATCAAAATGAATTTGCTGATAATATAGTTAAACTAGATGTACCAAT 

ATTAGTAGGATAAATAAAGAATCAATAACCATGGCACGTGAATATGAAAAGGTAGGGGCTAATATAAGTG 

TAAGTGTAGTGTATAAATTACAAAACAAAAAAGGCTGTTGTTATTAAGATGAGTCAACTGTGTAAGTGAC 

GATCCTGCATATTTCGTCATAATTCACACATTCTTAAAATTATTCACACATCCTTGAAATGTGTTAATAT 

TCCCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAAGTTATCAACTCAATTCACGCTATATAAA 

CCTTACAATTTCTCTACATTTTTATATTTTTTTATATTGGCTTTTCTTTTAGAATCAATCAATACTTTTT 

TTATCATTTAGATACATCTTTCATCTATTAATAGATTATCTTTCTATATATCAAAACACGACACAGTCAC 



GTGCCAAAAAGGATATAAGAAGGAACTTCATCTTGATTGCGCCGCAAGCAACAAACAATAAGCCAAGGAA 
AGTATATACTCCAGATCTACTATGAGTATGACACAGCTTATTAATGATCAAGTCTAC7^ACTTCTACTACT 
AAACACGTTCTTAACAAATCAAACAGTATTCAATTGTTTTAAAAAACACTATACAAAATTAATCAATAAA 
AAACAACTAAAGCTAATTCTA 
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TGGGAATTATTAGAGGATTCTTTTTCAGTGGATATATAAATAACGAATAAATTCCTTGTTTAATTATTTT 

AAGGGAAGAAAAAAAAAATAATCAAACAACCAACCCTCTTTATAATTAACAAGACTACAACTTAATAAAA 

ATGGGATATCCACCAAATTTCAAAATTGTTACTAAATCATTAACAGAAAACATTTTATTAGCATCAACGG 

CTTTTTCAAGAGTTGATAAATTCAATTTTGGTGCTCGTATGGCGGTATTTAAATTTCCTCAATCAAATAA 

AATCATTTTATGGTCACCATTACCTTATACACCACAAGTAATTGATGTTTTGACAAAATTTACCAATAAT 

ACCAATGAATCAAATTTAAATATTGCTTATGTGATAATTCCTGATCGTGAACATAATTTAGCTGCTAAAT 

CATATAAAGAAAAATTTCCCGGGTGTAAATTAATTGGAATGGAAGGATTAGATGAAAATTCATTGAAATT 

GGATTATAAATTTATAAAACTGATGGGTAATAAAGTTTTAAAAAATGATGAATTAAAACAAATCTTTAAT 

GACAGTGACAGTGGCTTGATTGTTGATAATTTTGAATTTGTTTATTTACCAAATCATGCAAATCAAGAAT 

TGGTTGTATTTGATAAATCATCATCAACATTATTTGAAGCCGATTTATTATTCAATTTAGGTGTACCGGG 

GTCAACTCTGGGTGAAACCATTTTAGAACAATATTCACCAGAGTTGGGGTTCCCTAAAGGGTTTAATCCT 

CATTCTGGTTGGTCATTTATAACTAGATATTTACAACCATATTCTAAAGTTGGTCGTTTCTTATTTAGAA 

AAATTGTTGATATAAATCATAGTAAACCTGGATTAGAAGCTATTTATAATTCATGGGATTTTAAAACTAT 

TGTTATGTGTCATGGAAATATTATAACTAAAGATGCTAAAGAAGCATTTAAACATGTTTTTGTATAAAAG 

TAAAAGAATTGAAGAAGATAGTCAAATAGTAATAATCAGAATATATGTATGTTTTTTTTTGAAGAAAATT 

AAAGAATATATTCACGAAATAATAATAATAAAAATAAAAAGACTAACTATTTTGAATAGAAAAAAAAGGT 

GGCACTATTTCAATGAGATAAACCAATTGTGAATATACGTAGATGCCTTGCAGCAGACAATATAACCAAA 

TGTTGAACAATATGTGGGATAAATAGCATTTTCATCTGTGCCATTGATATTGCATTTATATCCTATTGTT 

GAACAGTGACAGCACCTGTGGCGGTGGCTATTACATAACAGAACAAGTGGAACAGCAGTTACCAGTCAGA 

ACAGATCTAACAGCATTGTTTTTAGCAGCAGCATCTTTATCTTTGGTTTGACCAGATCCAGTTTTTTTAG 

ATTGTTGTTGAGCAGCCATTTTTTATTTGAATTTGTTGATTGAGTTAATATAGTTTATAAGAATTGAGAG 

TTACTTGTTTGAGTTGTTGATTAAGAATAGATTAAACAAAAATATACAAGAGAATCTGTAGACATATTTA 

TACTCATGAATTTATATATATATCTATGCTTATATTCATTTGATGTATAAATTGACATGATTATGAACTG 

CAAGAGGTTTGATTTTGATTTGTCTGCAAAAAAAATATGCTCTATTTTTCGCAATTACCCCCCAACCCCC 

CCCTCACAAAGTTCCGAGTTTAGTTGGAAAAATGTTTCGATAGAGTAAAATTTCAGGAACAAAATTGACT 

AATTGGGAGATGACAATGAGAAACAGTTTTGAGACTTGATCATACTTCCCCATACGCTCACCTCTTTACG 

TTAAATATAGCTCTTTACGTTCTCTACAATAATTTTTTTGACTTATTGATATTTCTTAAAATGGTTACAT 

GAAATAAAACAAAGAGATTCATAGGAATATTACTTTTTCAGGTAGACACAATGCAGCTAAGGTTGGATTT 

CTCAGGAAATATCATTCAAGCTTTATCTGTTAGTTAGTGCTGTTATTTATTACTGGTGAACTACACCAAA 

GCATACTGAAGGCATTTTACGAGGTTTTTGAAAGCTCTTACTATGTAGCAACTCATCTAGTACTTAGTAG 

AGGAAGTGCATCAAGTATGGATCAACCAAGTGTTACCTTATATCATTGGTTTAAACATTGTAAGACTCAG 

TTCGAAAAAAAAATTAAGGTTTCTACTTACCACTTTCATGTGGCTTAAAGTTGTGGATGTGATATTGAAT 

ATGTTTCAGATTTGTCATGAAACAATAAGAACAATAATAAAGAAGAAATCAAATCAATCTTCAATGTATG 

TATGTTTCTGTATGGCGCATGTGGGTTCTTTGTTTTAAAAAAAAAACTTTAAATTGAGTTTGTTTTTTCT 

TTCTTTGTTAGTCAATCAAACTTTAAAAAAGAAGAACAAGTAGAAATAGTATAGTAAATTGATATAGATA 

CTTTTATTACTAATAACAAATCTTTAATGGAATTTATCTGAAATTAATTGTCAAGTTTTAATTCAGTAAT 

GATTGATATTACTCTAAAACAAATGCTGTGTGGGGTTGTTTTGTTTGACCTGAAGTGTCCAAGCTTTCCT 

GCTTCATGATCTAACTCTTTGTACTGCTACACCTACATTGGGAAATATTGACCTTATAGTAACACTTACT 

TTCTTTTATTAATTGTCTAAACTATGCTTTTGATCAATTCACACGTACTTCATTTCTTCTCCCCTGACGA 

TCCTGCATATTTCGTCATAATTCACACATTCTTAAAATTATGCACACATCCTTGAAATGTGTTAATATTC 

CCAACATTATCAATTATATGTGTTCAGAATTGGTTGCAAAGTTATCAACTCAATTCACGCTATATAAACC 

TTACAATTTCTCTACATTTTTATATTTTTTTATATTGGCTTTCTTTTAGAATCAATCAATACTTTTTTTA 

TCATTTAGATACATCTTTCATCTATTAATAGATTATCTTTCTATATATCAAAACACGACACAGTCACGTG 

CCAAAAAGGATATAAGAAGGAACTTCACCCCCTTGCTCTTCTTATTATTGTGTGTGGTGTAATAGTTTAT 

GGTGTGGTGTATGATTGCGTGTGTGGGTGCAAAAAAAAGGTGAAGAAAAAAATACCTCAAAATAAAAACA 

ACTTCAAACATTCCCCTCATTTTCTTTCACAGTCATTTGGTTTCAATCTCTATTGGTCTTCTTTAATCAT 

CACTATTTATTCCAGTTTATAAGTCGAAAAAAGTTAGTTCATTGTTCAATTGGGTTTATTTATATTTAAT 

ACTATGCACTTGTTCTTCCTTGACTAACTCACATGAGAAAGAGAGAGTGAGGAGAGGGTGAATCTATTCT 

TTCTATTGATTATGCATAATTTTCAATCAGGTGATAAATAACATTATCGATTGTTCTGTGTATACGTTTG 



CATATCTTTCTTATCTATCTTCATAGTAAGAGAGAGATTAGATATCATGATATTGAATAGAGCGTGTAAT 
TATCAATTCACTATCATTGTAGAACCACCCTCAGTTGATCTTGTAATTGAAAGTTACAGATGAGTTGATT 
ATGCGTATAGGAAAGTATTGAAGTAAATAAAGTCCGTGTGTATTATCTCTTTTTCTCCGCATTTTATTGC 
TTTATCATTCATCATCTCTTTTCTTTTCTTTTTATTCTTCCTTTAATACAATAGTGGTCAAGGGGGGGAG 
GAGGAAGAAATTGCAATCTATAGTAACATTGATGTTCCCCTCTTTCTGATTAGTAATCCCCCTTTCACTA 
TTAGC AACAATAAACTATATATATATGTATATCAAAC CTAC CTTCCTTC CGGTCTTCATTTTTGTTCTCT 
TTTCGTTGACTAGAACTTTCTTAACAAACTTCAAAACTATCATGCCCGATTTATTTGATAATATTTTTAA 
TAAAATTGGTACAAAATTCACTGGTGGCAAAACCACTCATCATTATGGTGGTGCATCTCAAGTAAATACC 
GGGAAATGGTATAGTTATACCAGTAGTGCCAGTAATAATAATTATTGGTTACCTCGAGAAAGTCAAACAA 
AGACACCAGGTACTCAAGCAGAAGAACCAGAAACAGTTCAATTTAAAGTGGATCGATCAATGAGTGTTGG 
ATCAATTACTGAAGATTCTGGTGCTGCTGGTGCTGGCGGTGATCGATCAAGAATGAATAGTATTACTGAA 
TAATTGTATATACAACGTATATAAATAGGCTGGTCTTATTATTATTGCTTTTAATTTAGTATCTTTTGAA 
AGATAAATTGGTTAGTGACGTTTTTTTTTTTAATAAATTTGTTTCTATATTAATATAAAATTCAGTTATT 
ATTATTAATAGTAATCCAATTGTAATTATTTATAATGATATATATAAATATATTTAATATACAGTTTGTT 
ATTATTATTCTTTAGTTTTGCTTTAAAATTTATTTTACTTTACTTTACTTTATATGATATTATATCTGTA 
TTAATGACGAACTGAAATTGGTGAAATCGGCATTAGATTATGGACTGAGGATAAAACAGTTGAATAAGGG 
GGAGGAGGTTTGATGTGGTGGTGTCATATCA 
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AATGGGTTTATACAATCAAGGACACCGGTCGCTACAAGGCTCGCCTTGTGGCACTTGGTTATCGACAACA 
GGCTGGTGTGGACTTTCTCGAAACGTATGCTCCCGTGATTCGTGGAGAATCAATCAAACTAATCTTTGCA 
CTCGCGTCAAAATCCAAACTAAAGATTCATTCCATAGATGTTAC CACAGCTTTC CTCAACGGGGAAATAC 
TGGAACTCATATTTGTGAAACAACCTCCGGGATATGAAGATAAGAAGCGTCCTAATCATGTTTGTAAGCT 
CAATCGCAGCTTATATGGGCTTAAGCAGCTGCCACTAATGTGGAACATTAAATTAAATGATGTACTTATA 
AAGGAAGGTTTCCGTCGACTTGGTGGTGACTTAGGGATATACATTAGTAAGGACAAAAGAACAATAATGG 
GAGTTTATGTTGACGACATTCTCATTTGTGGACCTTCTGACAGTGAAATTGAACAAGTAAAGAACAACGT 
GAGAAAATACTTCTCAATAACTGATAATGGATTATGCCGAAAATTCCTTGGAATTAACGTCTATCAACAA 
GCAAATGAAATAAGATTAAGTTTGAATGATTATATAAGGAGAATGATTGAGGAGTTAAAATTATCTGTCT 
CAGAAACAAACCCAGTATCTATACCATCTGATGTCAATTATGAAATATTTAAAGTTAACGAAAATGATGA 
TGAGAAACCATGTGATCAAACCAAATACCGAAGTTTGATAGGCAAGCTCTTGTTTGCCAGTAATACTATA 
AGGTTTGACATCGCCTATTCTGTCAACTCCCTATCCAGGTTTATCAACGATCCCAAAGAAAAACATTGGA 
TTGCAGCTGTCAAGGTGGTAAAATATCTCAGTGGTACTCAACGGTATGGTATTTGTTATAACGGTAACGG 
TGACTTGAATATTTACGCTGATAGTGATTGGGCTTCCACTCCATCTGATCGAAAGTCTATTACGGGGTAC 
ATTGTTACCTATGCTGGAGCGCCGATAAGTTGGCGTTCCAAGAAGCAGAACGTGATAGCCTTGAGTACGA 
CAGAAGCGGAGTTTATGGCTCTCACAGAGTCCATAAAGGAAGCCCTTTGGCTAATATACATTTTTCGAGA 
TATTAATGTGATATTGAAATTACCAATTGTGATATATGAAGACAACCTACTGTGTCAGAAATTACTTGAA 
AATCCTCGATTCCATAATAGGACAAAACACATTGACTTGAAATATAAATTTACCAAAGACCATATAGAAG 
CTGGTACAATCAAAGTGGAATCAACTAATTCAGCAGATAACTTAGCCGACATGCTAACTAAACCTTTACC 
AAAAATTAAATTTAAACATTTAAGATGGCTAGCAGGATTAAGACCTTTAGATTGATTAGATAATGATAAA 
ATGAAATAAAGATTAATTTGGAGATGCAGGTTGATGGGGAGGATGTTGGAAAAATGAAATATGATCAATC 
CTGCATCTAGAACCTGTGGCAGAATGAAACCTACGAGATTATGAATGACTTGTGAATACAAGTTGAATGT 
TACAGAATGTTACCAAGAAGGTTACACTTGAATATATGAATGACTAGAAAGTGAATTGAATGTTACAGAA 
CCTGAATAACAATGTTACACGAATGTGTGAATGATATGAGTTTATCTATAGTAATGTGACATATACACAA 
AGGTGTGAATGACCGAGAAAACAGATGTTACATTACGGGCACTGGAGAGTGCAAGTCTAAAGAATCTTGG 
AGTAGAAATAAGTAATATAAAAAGGACCAAAGATTCTTTAGAGAAAAGTAAATGAAACTATATTAGATTT 
TATATAACTAACTAACAAATAAATAAAAAATATAATATGTCTACAATGCCACCAACTTCCAAACGTACTA 
GAAAGAGAACTAGAACCGATGATAATGCTGAACCAACTATTCAAGATCCTTCACCGCCACTTGCTAATGT 
TGAACCCACAATTCAAGAGACTCCACCGCTGGTTGAAGTTAGTGATGAGACTAATTCAACTGAAATCAAT 
GAGACAAATAGTAATACTCATGAAGAAACAAATGTATTAACTAATGTGCACTCCTCTCCAATCGAGACAG 
TTACTGAGAGGAACTTCAATTTTCAACAGGTTATTGCCTCTATCTCCACTGTGGACAATCAAAGTCTCTT 
GAAGGATAAAATTTCTTATGATCATTGGTTCAGTACCTTGAAAGAAAATGCAATCATGATTAGTCCAGAT 
TTTCTTGACTTTATTAACAAAGACACCATGGATCTCCAACAGTACCCAACTGTCTACCAAACATTCTTAG 
ATCGTCTTATTTGTGCCACAATTGACCCACATATCAAACAATCTTTAAAATATCGGAAGTTATCAGGAAA 
GAAAATGCTTAGTGAAATTATCTCTCAATTTGGTTCTATGACTATTAAAGACAAGGTTAACTACTCCATA 



ATTATGGCTACCAAAATTCATTCTGATGTCACCACTCATTTAGACAAAATGAATTTACTGGCTCAATTTT 

ACGCATTTCTTATGCGTCAACCTCAGGACCTTAAACCTGCCCTTTTACTTATTGCGGGTATCAATGACTC 

ACGTTTCAATGAAACATACTTTCACGATAACAAAGAATTAACGATCTCTAAGTTGGAACGGTATATCATT 

AATCAAAACTCCAAAATTACTCCGTCGGTACCAACACCTTCTCCACGTGACGCTGTTACGGGTTTACTGG 

TTACCCAGCCTACGTCCGCTCTGGGACAAAGTGAAGTGTTTAATACACAATGTTTTAATTGCTTTGGGTT 

GGGCCACACTGCACGTCGCTGTGCCTCTCCGAAACGTCTTGGCCAAATAAACAACCTTAGATCTAAATTA 

CTTGCGTTTGAAACTCGATCCAAATCCAGAAAGCGTTTTCCACCTCAACCTCCTCCTACGAATCGGTCGG 

CAAACTCAACAATAATAACTAATCCCTCACCTACTGACGATACCATCTCGTCCACCACTGAAGATTCTTT 

TCCACGGGACGTCTTTGGATGGGCGGCATCATCTGACCAAATCAAATCAAAGGACAACCTTTCTTTATTT 

TTTGACACAGGTGCCTCGGCACATCTTATCAATAATCTCAATCTACTTCATGATTACAAACCCTCTAAAG 

AAAACAAACATGTGATCACTGCGAACGGTGATAAAATTCCTATCTTAGGAACTGGAACTGTGAAACTCCA 

ACATGGTCAACACAAGATATCACTTCGCAATTGCCAATATTCTCCACATCTACACATCAATCTTATCTCA 

CCCAGACTCTTACTTGATGATTCCACTAGCATGACTATCACCCAATCCGGGATTTATCACTCCAAAATTG 

GACAAATTGGGTATTATTCGACTGAAGATGGTAATCTAATCAAGTGTATGTTCCGTCCCATTACCATTCC 

TCATCTTTCGTTATATTCTCAATATGTCGAAATGGGTCTTCAATCTAACAATGTACTACGTAACATTCCA 

GCTTTCACGGTCCATATTCCTCAACTACATGACTCCCTTGGACACACATCTACTCAACAAGTTTCAAATG 

TCATGAAACGTTTCAATGTCACTACTGACAACATTGGTACGGACTGCGAAACTTGTCGGCTTGGAAAAGC 

CATTACTCAGATTCCCAAGATCTCAACCCATACCATCTCTAGTCATTGCTTAGAACTACTTCACGTTGAT 

GTTCATGGACCAATATCCGTTCCTAGTATATTTCAAGAACGTTATTTTCTTGTGATCCTTGATGACTACT 

CAAAATACTTGACAGTTCAACCACTATGCAACAAATCTGATGCTACTGCCGAAATTATCGAATTCATCAA 

TCATTGGGAAAAGTTCTTTCTGGGAAATGGCAATTACCATACGAAAATTCTCCGGTCGGATAATGGAGGG 

GAATTCTTAAACAAAACATTGACTACCTATCTTGATTCAAAATATATTACTCACCAAACCTCCAATGCCT 

ATGAACATCATGAGAATGGCGCTGCAGAACGAGCTATTAGATCGGTTAAAGACATGGCTCGAGTAATATT 

GCTTCAATCCAAATTACCAGTGCCGTTTTGGTCCCTAGCAACCCGATGTGCTGCGTTTGTTATGAATCGT 

CTTCCTCATAAAACAATAAATGGTT^AGATTCCTTATGAAGTATGGACTAAACAACTTGTCAATCTCAAAA 

TGATGAAACCGTTTGGCTCTCAAGTATATGTGAAAATTCCTATTGGAGTCAAAAGTTTTTCTGCACAAGC 

ACTTTCTGGAATCATGGTGGGATATGCCACTAATAAGAAAGGCTACCTTGTATATGATCCCACACAAAAT 

CGAATATTCACATCCTCACAAATAATATGTCATCCGAGCATTTATCCAGCAGCCAACCTTACGTTTAACG 

AACCCTTAATTATCTCATCGAAAGTCACGGCTGCTCATCTTCACCCCCTTACCATTTCCAATTTAGTTAT 

TCCACCTACCAATGCTGTATCTGAGACACCTCTGCAAATTGTGTGCTCTCCTCAAATTCGTCAGTATGTC 

CCAAAGTTTGCCAATTACAAACTGTCTTGGAACATGGGGAGGATAAAATATATGCACTGATTATACCAAT 

ATCGATCGGCAATATGAAACGCACAAGAACAAATGAAAACAAAATATGCCAGCTAGATGAATCGAACAAT 

ACCACCATACCAGATAGTGTAATTTTATCGGCTAACAATGTGTTATTAAACTTAGAATCGAGATCTTCCA 

TTCCCAAAAGTTATAAGGAAGCTATAACATCTAATGAAAAATCCAAATGGGCTGATGCTATGGATAGCGA 

GTTTAATTCATTACAATCCAACAACACGTGGTCACTTGAACCACTACCGGAGGGACGCAAAGCTATTGGT 

GTCAAATGGGTTTATACAATCAAGGACACCGGTCGCTACAAGGCTCGCCTTGTGGCACTTGGTTATCGAC 

AACAGGCTGGTGTGGACTTTCTCGAAACGTATGCTCCCGTGATTCGTGGAGAATCAATCAAACTAATCTT 

TGCACTCGCGTCAAT^ATCCAAACTAAAGATTCATTCCATAGATGTTACCACAGCTTTCCTCAACGGGGAA 

ATACTGGAACTCATATTTGTGACACAACCCTCCGGGATATGAAGATAAGAAGCGTCCTAATCATGTTTGT 

AAGCTCAATCGCAGCTTATATGGGCTTAAGCAGCTGCCACTAATGTGGAACATTAAATTAAATGATGTAC 

TTATAAAGGAAGGTTCCGTCGACTTGGTGGTGACTTAGGGATATACATTAGTAAGGACAAAAGAACAATA 

ATGGG 
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TTTGTTTGATAAAGAAAATAAAAAAAAGAAACAAGGGTAGTAAATGAGTACAGTAGCCCTGTTGAACAAA 
GTCTGCGATAACTTAATTATGGGTGAACTCAAGGGGACAGTGTCTTTGTCTATCATCCGATCCTTAATCA 
AGTCTATTACTGAATATCAATTATTTGGACACCTGTTTATAAATTACTATCCAATCTATGTTCTTTCAAT 
TCTTTCCTTCAATATTTTGCCAGCCAATAAGACCAAACATAATCCAAATATACATACCAGTGAATTCTAA 
ATTGTTTGGTGAAACATCCATTTTTGATCTATTTCAAATTGTATTTTCTTTTAGTAGTAGTAGTAGTAGC 
AGTAATTGATTAATTATTATCAATATCCGAAATGATGATAAGAATAATAATTATATATATAAGAAAGAGA 
AAAAGAGAAAAGAAGAAGAAGAAGTATAAAAGAAGTTGTTATGGGTTTAATTAAAAAAGAAAAAATTCAA 
TGAAATTTGTGTTGTGTTGTGTTGGGTTTGAATTTCTGTATAACTCAATTTGGAGATTTTTTTTTTTTTT 
TTTTTTTGAAATTTTTATTAGTCGTGTACATTGTTACAATTGTTTCTCGTTCCCCTTTTTTTTTTCCTTT 
CTTTGTTTTGTTTTGTTTACCTTGTGATAATTTTATACGTGTTGAGAGGGCTCTCGTCGTGCCCGTGTCC 



GTTTCCGTGTCCTGTTGGGTCCCCTCCGCCCATGCCGCACCGCACCGTACGGTAATGATATCTGATTGTT 

GGAGCGTTCTTCGCTAACAGGTTCTTTATTTTTGTTCGGGGGTTTCGAAAGATAATGTAGAAACACCAGG 

GCTTATAACTGAGAGTTAGAGTAGTGGAGATTAGTAGTAGTAGTACAATCCTATAGCCCAAACATTATTG 

GAGAGATCTTACCAAATAGCAATCATCATGATGTATTTACTACTACATAAAGAATTTAAGACGATATTTA 

CC^GCAATAAACAACATGACCAACTAATTAAC^AACATTTGAAAAACATAAAGTAATTAGAAAGTTTAAA 

AAGTGTACAACCAGTGTGGAAAAAGAATGGAATTGGAATTGAACAAAGTTATTAATTACTGAAAAAGGAA 

ATTTAATTTCTTGAAAGGCAAATCTTTGTTTGTTTTTTTTTTTGGGTCTTTTCTTTCATTTAATAAGCGT 

GGGGTATTAATAGATAATGATATTGTTGTTGTTATTGTGATATTGTTGTGAAATTTGACATATGATAAGA 

TAAGTTTCTTTCTTTTCTTTCAACTAGTATAATTGAACTAAAGACCACCACCACCACCACCACATAGTTA 

GCAACCTGATATGCTGTTCATGTAACAGTAAATTATCTTGGTACTATACCACTTGTTGTAATATAGCTAA 

TGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAG 

TTATTGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAA 

TTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAA 

ACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAACCA 

CTAGAAATCTATTGATGGTTTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCT 

AGTATATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAGGA 

AACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACAAC 

AATGATAATTCATCTTTTTTGTCAAGACGATAGGTTAATGTTACAAGCACTTTATTGGGCTCGAAATAGT 

GGTAAATAAGTCCATAGATATGACCTGTTACAAGTTATTTCGATGATCAAGCCGGCTCTGTGATTAC 
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TTTTTTTAAAGAATTAATTAAATATGATGGATGATAGAAATTAAAGGAAAAAGAAGAAGAACAAAACAAA 

0 AGTTTAATTGAAAAAAAAGGGAGAAATGAATATTGAATTATTCAGCTTTTATATTGCTGATAGATGTTGA 
%H AAAAAAAAACGGAAGAATGGGGATAGCAAAACTGTGGGTGAGATTAACTCATCTATGGCGCTAAAAGTCT 
a E„ TTTTTTTTTCTCTTTTATTAGGGGGCACATAAATTATTCTTTTCATTGATAATCTCGAGTCCGTTTTTAG 

1 = I TTCATTATTCGGAATATATTACCGTATTGGGAACGATAATTATTATTAGTTCTCCCCGATGGTTCGATTT 
H TGCTGGTGCAAAAATATAAATCCGATATTACTTTATTGGTGTTTTAATAAATCCGTTTTAAAAGTTCGTA 
V* GACATATACAGGATGATAATAATTTAACCGATTTATAAGTTGGAATCATTTGGATGAATCCGCTTGGGGA 
I'fi GACGTTTTCCAATTTTAGAAGTTTAACTATCAATTTTATGTGACATCCGAGTGTACACATTTTGTGAATT 
$Z TGATCTTATCAACTCACTTGGTGTACCATGGCATTTATAACAACACTTTTTAGAATCGGCTGAGTTACAT 

GCATTTCCTCTATTTGTAGATTAATGGAAATTCATAAAATCGTTCACATTTTTTTCTATAATGAGTACCA 
*. TTCTGTTTCCATAAGTAGGGGACTAAAAAATAATTGATATCTCTAATCAGTGACAGCTCTAGTCAACTTG 
lZ ACCGTAATGTTTTGACGACCATTATATTTCTTGTTTGAACTATTGATTTATGAGTGTTGTCGTAACAAAA 
lf s GATCAATTCCCGTCAAAACGCATTTGGCACTTAATCTTTGATTGAACCGATTTTGATCTCAAAACATAGT 
' l ACCAAGGTCAATTATGTTCGCTAATGAAAGAAAGCTGTGACGAAAACCTCAAATTCATGAAGAAAGAATT 
*D ACTGTTGTGGAAAATAAAAAAGTCTTTCTTCTGATACTTTACAAGTCCCTCAACCACAAATACAAAAATG 
V3 AAAGTTACCCATCGATCTTTTTCATTGGTTAAGT^ATTAATACGAGAATATCAAATTATCTTAGAGAGGGT 
%P CTCACAGAGCAACTTTCTGAGGCACACGGTCACCAACATGATTTGTTATAAAAAATTCAACCAAATTTTG 
GAAAAAATGAAAACAAAACAAAACAAAATCTGAAACATCCCGAAAGTCACAAATGCTTGATTACTTAAAA 
TTACTTATTTGCTTCAAGACGCTATTATTATTATTATGACATAATACTACTTGAATAACAGTGAACTGTA 
ATTGTATTAAGAACAAATCATAACAAAGGAAGATGATGACGATGATGATGACCCCTTGAAATATCCAGGG 
CACATGCATTGTGATGATTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAG 
GTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTTAATACAGTTATTGCTGTTGACTACTAT 
TGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAACCAACTAACTACCGTATTAAA 
TTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTA 
TAAATATGTGTAAAAATCCCCTTTAGAGACTAATCACTAGAAATCTATTGATGGTTTCATATATAGAGAT 
TAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGAAAGCACTACAGTATAGTATGTCA 
GAATCAGATCATTTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAA 
TAATGGCAGATCT^CTC^GGAGCTAACCCACAACAATTACCATATTATATGAAGAAGACTATAACAAA 
ACTGTAGATAGTAGGGGATTGGTTATTTCCGGGGAGTAGAAGTATTGGGTTATCTAAGTCAATCTTTAAC 
AACCAACAATCAACAACAACCAACAACGTTTTTCCTATTCTCGGAGATAACTTGATTAACTTAAAAATTT 
TCTTGTCAAAAAATTTCT 
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TAATTCGCGTATGAATGAGATTGATGCCACTGTTGGTGCTGAAGTTTTAAAAAGAAAACAAATGGAAGAT 

ATGCAAAACAATAATAGTAATAATGGAGGGAAAAGATTTAAATCAGATCCAGTTTCTGATCAAGAAATAT 



TAGATGCTTGGGAAAATAATCAATTGGATAGGTTTTCAGTGGATCAATTGAAGGCATTTAGAAGAAAATA 

TCCTGATGTCAAATCAGCTAATAAGAAAGCTGACTTGATTGAAAATATCAGTGAGTTTATAAGGACTCAT 

AGAAAATGAGTTAATATGTAATAGTGATATGTTTATAGCTCTGTAAATACATGTAAATTTTTTGGTTGCC 

AATGAATTGATTGAGACTGAAAATCGTTTGTGGTTTGCCAATGAACATTAAACTTATTACTTGATCTAGA 

AGGCAGTTACTTGTTTAAAGAAGTGATGAGTCGTGATTAAGTAAAGTTTGCAGCACTAAATATTGTATGG 

TATTTGACTTAATTTTTTCTGCAAAAAAAATTACAAATTTTTAATGAAAAAACAAAACACAAGATAATAA 

CATTATAGAATAAAGATTATAGGATCCTACCAACATAGTTCCATTGCTGATCAGGACGTTTAATAAAAGA 

GCTTCCCAACAGAGACATATCTTAATAATAACAGGCTATTTTCTGCCTTTAAAAAGCCATCTAGGCTCAA 

AAACCTCAAAATAATTCATCTCC CAC CTTGGCAGCAGAGTAGCCATAACACAGCCAAATCAATTTCTATA 

GTTTACATAATATATAAAAGGTTTCTAATAGCCAGTAAGCTTATAGAAATTACCCTTTTCAAGTGATTTG 

ATGAACAAATTATATTCTTGTACAAAATAGTATATTTAAAATTAAGAATTTGGCTTGCAAAAGAAACTCT 

CGGTAGCTTAGTTGGTAAAGCATTAGACTGTAACTGAGTTATTGTTTGCAAACAAACAATTGGAATGCGA 

TCTAAGGATCGGGTGTTCGACTCACTCCCGGGAGATTTTCTTTTTTACCACCACCATAGTTAACACGCTA 

CCATATGAGACAGAAATCTAGCATGAATGGCTTATATACAAGTGGACCATTTAGAAGCATGAGCTGTGTC 

CTAGTTTTTTATCATTTACAATTGAATTTCCCTCTGAAATTAAAATTCTAAGGTATTCATTTATCTCAAC 

TTTCTTAGATGCTGTTAGTGGGTTAAAACTTGGTAATGAACCACTGACGGAAGTTATTTTTGTGAGAATT 

AACTATAAATATATCAGCTTGGTTTTTTTTAACAACTTAGACAGCAATAACCAACACCCAACTAATTAAT 

CAACATTGTTATAAAGTTGTTTTCATCTGTCAAACCAGGCACATGGTAGCACATCAAAATCACTCTCGAT 

AGCTTAGTTGGTAAAGCATTAGACTGTAACTGTTCATTCTGGATATTGATATCTAAGGATCGGGTGTTCG 

ACTCACCCTCGGGAGAAATATTTTTTTTTTGCTTATAATTCCTTCAAATATTTACCTCCAGTATCGGTAT 

TGAATTAAATACAGAGAGCAATTGGAAAGGTTATTTTTTTTGTTATTTATTCCAAAAATTTCAGGACTCA 

AAGTTTAATAAGCCAAAGCCTATTTTGTACTGCGCTTCCCTTTAAAGCCCCTGCTAGCCCCTGGGCTTGT 

TGTTGTTGTTGTGTATGGAACAAGTTTATTAAATCCCATGACGACGATGATGTAATTGATTTTGAGAAAA 

AAAAGGATGT^ACAATGGAAAAAGGTACAATGGGTTATATACTTTGCCATGTGGTTGAAAATATGTTTAAC 

GGCTGTAGAACTTTTTTTTATTTTGTGTTAGTGAGTGAATTTCGCTACAATTGTTATTATACTCCACAAT 

TCAGATTTGTTGATAACGTTTAATTACTTAAATTTTAGTATGCATATTGATATATTTTTTCTATGAGATT 

GACGATTAATTATCGGTTTGTAAAATTCTATTGAAACACATTCACCAGTGCAACAATTAGACATTTTCTC 

AAAACCATGAATAGCTTGCAACTAAAACAAACAATAAGGCTGTACACTTTGCTGGCAATAAATCAGTGTC 

AAGTCAATATAAACAGTCTTAAGAACAATGAGAAACTCAAAAGTTAGGGTAGTTAGTTGATTACAAAAGA 

AAGAGACCACTTAGAGACAAAATAACAAGAAATGACATCACCATTGTAATAGATACATTTTCCAGTTATT 

CAAGCAATTGATTGAATGTATTCATAGCAAAATACATTTAAGACATACAAGCTTAAACATGGGTTATTCT 

CTAGTGGTGTTGTTGTTGCGATTCTAAGACTCCAATCTATGATTAATAATCGGATCACCATTTGCACATG 

AACTACATTAAGTACTAAAAAATATGCAATTCGCCTGTTTTCTTATTGATTAAATTTAACAATAAACTTG 

TCTTTAGCTTTGGCAAAAGCCTCCTTGAAAATCCTAACTAAGCACGTTGGAAGAGCAATGGAATTGTGGT 

TAGTTATAGAAAGCAAAACAATCTGAAATTGTAAAGTATTAGATGATGTGCAATGATATCAGAATAAAAT 

AGTTGCTGTTGAAAATTTTGTTCAAGACTCTTCACACAGCATAGCAAATAGTTATACATAAAGAGAAAAG 

TTCAACGTGCTTTGTTGCCCGTGTCTATTTGTTTTTTTAAAGCCGAATTCACCACTAGAGGGAGTATATA 

TGATTCAGAGTATCACCATCATCATCATCGAGCCCCCGTAAAAACTTACCAACTTTCGTCGACATTTCCG 

ATGAGAAACTTGATTTTTTTTTCCTTCCGTTGAAATAATGTCAGATAGCTCGCAAATATCGGAACGAGCA 

AATTCTTGGTCCAGCACCAATAATTCGGAAAATCACACTCAGTTAATATTTACTTACAAAATAAATTTAT 

TTGTAATTTAATGGCTATAAAATGGGAACGTAGTAAGAAAATCAACAGCTGTTGTAATATAGCTAATGCT 

AATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTAT 

TGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGA 

TTAGTGAAAACCAACTAACTACCGTATTAAATTAGTGTATTAAGATTGATTCCTATTAAGGATAAAACAG 

AGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAATCACTAG 

AAATCTATTGATGGTTTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTA 

TATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAGGAAACA 

CTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACAACAACA 

GCCTAGTCTTCTTGACACTAAAAAAAAAGAGATAAAAAACAATTTCAGCCAATCACATGTACTACATTTG 

TAATAGATTTTATTACTTCAGCTGCTTATTACACAAACAAGGTTGAATTGATATTGTGTAGAGTAAATTT 

TCGGAAATAGTTTGAATTGGGTGATCATTTTCTTTATTTTTTTTATGTCTTGTTTCTGTGAAGATCGGAA 

TGCCAGAGTGGAGCTCGTGAATTGCACCACTAATTGCAGCAGCACCATATTTCAAATAAAGTTTCTCATG 

TTGTAGTAAGGATTGCTTGTCTCCATGAAACCAATCACTTAACTAAGCCCCAGGCTAATTAGTGTGTCTT 

CAAACAGTTTTGTACTAGAGAAACTCAGACCTTCCCAGGGCAAGTAACAACCTAAAAAAATGCCACAAAA 



CTAAATGCAATTTCAGTTTGATATGATAGGCAATGACATCAACACCTGGAAAAAAAAAAAACTTTCAGGT 
GATGAAACGATTAAGGATTAAAGTTTGCAACGAAAAACAAGTGGAACTAAACTTTGCCTTATTGTTTTGT 
TCCGCTTACCTAATGATGTTTACTCCTTAGAACAAACAACATCAACTACTTTTAATCCTGACGACGAAGA 
AGAAGACCAAAAAGAATAATTAGCCGCAGCTACGGTGGTGGCACTAGTAGTAGTGCTAGTGCTTGTTGTG 
TCTCATCCAAGAGAAATGGAAAAACTGCAAAAATGCCGCAACTTTGAACATTTTGGAACACAATACAACT 
TTTTTTTTCCTTTTGGATTTACGATTAGCGCGATAGACGTGACCATAAAAATACCACACGATGTGTAGAT 
CCTCTAAAAATAATGTACACATTTCCAGGCTTTTGTTTACTGCTTAATAATTTGTCATCATCGGTAACAA 
TGATAGTCTCCCCACCCTAACTACAGTAGACGGAATTAGACACCAAAGATCTTATAAATCAACCCCAAAT 
TTTCCCATTTTGATTTTTGATTTTTTCGTATTCCTTGTTGTTTCCATAATTTTTTAGTTACTCCTCCTCA 
ACTAAACTAGATAACTCGTCACAGTTAACAACAGAAAGGTATGTTAAATATTTATTTCGTTCTAAATTCA 
AGTTTGGTATAGAATATTGCAAACAACAACAATCTGAAAAATGGACTTTAATTTGCTCTACAAAATGCAA 
ACACATCTAGAATTAATATTTGGTCTGGAAACCGTATACGGAAGTTATGGATAATCACGTTATCCTGATA 
TCTATTATTAACACCACCACAATATCTATTATTTCATGTATGGATTGCGGTGCCAAGATCAAAGAATCAT 
TTTAACCCGATATCTTACATTTCACCTCGATCTAAATGTGATTCAGTATCACCGGCTCATTGTTTCACCA 
CTCAACCTCCCCATACTGGGAGTACATAT 

>retrotransposon_24 4954bp public: 1..4954; zeta-like LTR: 256.. 763 
TGTTATAAAAAATTCAACCAAATTTTGGAAAAAATGA 

AAGTCACAAATGCTTGATTACTTAAAATTACTTATTTGCTTCAAGACGCTATTATTATTATTATGACATA 

ATACTACTTGAATAACAGTGAACTGTAATTGTATTAAGAACAAATCATAACAAAGGAAGATGATGACGAT 

GATGATGACCCCTTGAAATATCCAGGGCACATGCATTGTGATGATTGTTGTAATATAGCTAATGCTAATT 

CTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTT 

AATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAG 

TGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAG 

TGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAACCACTAGAAAT 

CTATTGATGGTTTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATT 

TGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAGGAAACACTTT 

CATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACAACAATGATAA 

TTCATCTTTTTTGTCAAGACGATAGTTAATGTTACAAGCACTTTATTGGGCTCGAAATAGTGGTAAATAG 

GTCCATAGATATGACCTGTTACAAGTTTATTTCGATGATCAAGCCGCCTCTGTGATTACGGCAATTATTT 

TACTATTGATAATGAGTAAAAGTTCACAACCAATAGAAGATATCCACCCAAGCAATTTCTCTCGACGAAC 

ATCTTTAGAATAGTTGGTATAATAACCTTACGAAACATTAATAAAGAAATTGTACCCGATCTTGTTTTCG 

AGTCAAAAACAAAGAAATCAAACCTAGAATCAACAATGTTCTAGC CATCATCTCCCGCCACC CAAGTGAT 

GTACCCCTATTTCTTGATTCTATTATTTTCTGACCCTGTGAGGGAACAAAGATACTATCTTTAATAAAGA 

AACAAAACCTCAACAACAACAACAACACACTAACACACTAAGAAACTAAAACTTGACGACAATATGATAT 

TGTGATATATTAATACTGCCCAACATTCATCGTCGTCAAATCAGAATTCAGAGCAAAAAAGAGACGTTTA 

CGTTACATTCC CCGATGTTTTTGTGACGTAACAAGC CGAAGAGAGGGAAAAAAAAAGTATGGTTATTGAA 

AATCTAGTTAGGATCTACTTTCCTTTTTGTCTCATCTATTTATCAAACACTATCAACGCGTTTTGAATTG 

ACGACCAGATCTATATCATCTAGTTTATAATATTCTTTGTCAGATCTGAATTGATCAATGTGTGGTTGTT 

GTTTGTAGTTTTTTGTTGGATTTAAACTACTCACAAACATCAAGCTTTTGAGTAAGAATTGAATCAAATT 

CAATATTGTCTTGTCACTTTTTTTCTGCGTGGTACACTACTACGAAACAAAATTTAAATTGTCGTGTTCT 

TTTTGATAATTTGTTTGTTATAATTTTTTTGCTTGTGTGAAAAAAAAAGAGAAATGATAATTCGTTTTTT 

TTATAGGGGTTTTTCTAATTCAACTCTTATAATAAATTAACTTATCAACACCGTAAATATAATTAAACCA 

ACTGTGTTGCGCCATAAATAAATAAGTTGTTTCGGGATCAACACATCTCCAACAAATTGAATCGTAGGTG 

AAAATTTTTTTTTTACTAGTAATTGGTAGTAATGGTGTTCACGAGTATTTTTTTTTGGGGAGTATTTGTG 

TCCCTTACAAGAAATAAAGCCAGGGCCATGAAAAAAAAATTAATACAAAACAAAATATTCGTATCAGCAC 

AGCAGCACTTCCCCCCCTTTCCCCTTCGGCACGCCCTAAAAAGAATTTACTCATGTAGTCGTTATCACTT 

CAACACCACACAAGAATACCTCGAGTGAAAGAAAATTGCTTGGGGAATGTGTGTAATTGGCTATGTAGAA 

TTTGGTATTAATAACATTTCTACTGTTTTTCTTGTGCCATAACATACTTTTATCGCGATATATTGCAAAG 

CCCCCCCTTCTAGCTCCTAATAAAAAAAACCCACATTACTATTATATTTAAAGTGTGAATTGGAGGGGAC 

AAAAACAGAACAATGAGCAATTTATAATAGTGAATAACCTTTAGCAAAAAAAAAACATTGTAAATTCAAT 

ATTTGACGATGGATTTAACAAACAATCAATCAAATTCTTAGTGTTGAACTGAACTGAAGTGATATTTTTT 

GCCATATGCACAAAATCTTAAATATTCAAGTCTACACGAGAAAACCCAAAAAAAATGTTATTGTTTCAAA 

AATTAATGCTTATGTAACACAACGCCAAATTTAAACCATTTTTTTTGTGGTTACTAAAAAAAAAAACAAA 

CAAAACAAATAAAAAAAAAGGATTACAAATTTCAGGCACATTGTTTAAATTTACTGACGCCAATTATTGT 



TTGATTCAAGTATAAGTTGAGAATGATTTTCCCAATTTATTAAAACTACATACAAAAGAATATTAACCTT 

TCTATTTTCTTTATTTTTTCAATTTAAAAGATATAAAATCGTTTCACCTTTTCTTTAAAATTATAATTTT 

CAAGACTTACCTTATTTGCGTTTTCTAATCGCGTCCACTCCTTTATTACTACTATTAGCTTAAGTCTTTC 

GTTCAAAAAACAACTACAATGCGTGCCAACTATTTGTTATTATTAGCTGCCACAGCTGTTCAAGCTGCTC 

CATTCATTAAGAGATATGAAAACACTACTGCTCCAGCCAGTCAATTGTCCACTTCATTGGCTGATGGTTC 

CACTACCATTCTTGGTTCTTCATCATCCAGTGTTGAAGAAGATGAAACCATCACTTCCACTATCGTTCAA 

TATGTTACTGTCACTTCTTCTGACACCACTTACGTTTCTGCCACCAACACTTTGACTACTACTTTAACTA 

CTAAACCAACCCCAGTTATCACCACTGAAGCTGAAGATGACGAAGAAGACAATGAAACCATTACTTCCAC 

CATCCTCCAATACGTTACTGTTACTTCTTCTGACACCACTTACGTTTCTGCTACTAACACTTTGACTACT 

ACTTTAACTAC CAAAGCAGC CGAAGCTACTGAATCCGAAGAAGAAGAAAACGAAACTATCACTTCC ACCA 

TTCTTCAATACGTCACCGTCACTTCTTCTGACACCACCTACGTTTCTGCCACCAACACTATAACCAGTGT 

TTTGACTACCAAAGCAGCAGTATCTACCAACGACGTCAGTGAAAATGCCAAGGCTGCTACTACTGAAGAT 

GATGGTGAAACCACTACTTCAACCATTACTAGTATCGTTACTATTACTGATGCCAATGGTAACACCGAAG 

TGTTGACCGAAGTTGCAGCTGAGACCAGTGGTGCAGAAGATGCTTCCTACTGTGTTCCTTCTACTGTCAC 

TGTTACTGTCACTGCTGAACAAACTTCCGAAGTTGTTTCAACTATTGTTCACACTACCCAAGTTCCACTT 

ACTGCTGAATTTACCCTTGATGATACCACTACTACCCTTACATCTTGGGTCGACTTGACTTCTACAGATC 

TCGTTACTATAACTTCTACTTCAAGTGTCTATGATTCATACTCAACTGGCGTTTCTCAATCCCATCCAAT 

TCCTCATACTCCAACTACACAATTTCGGACTATGCCCCACCAATCAGTTCTTACTACTCTTTGTAAAGAG 

CTTGATATGAAAGTTTGTGATAGTGATACTACTACCGCCGCCACCACCACACCTTTAGAGTAAAGATTTG 

TTTTTAAAAAAATCATTCTCATCATTTTTTTTTTATTGGTTTTCCATTTTATGTCGTTTTTGACGTTACT 

CATTTGTTTTTATTGTATTTTGATAACTGGGTTTATTTGAATTTTTGCTTTTTTTTATTTTTATTTTTAA 

CATTGTTATTCCTTTTTCCTTTGATTATTCCTTTAGTGGTTGGTGTTATTTTGATTTTTGCTTACATTTT 

TGCTTACATTGTTATATTTGTTATTCCTTTGTTAGAGTTTTTTTTTATTTTTGCCCTTTTCCCTTTTGGA 

TTTTTTTATCATTGTCTGTCTTATTCAATGGTTTTCTAGTCTAAAAATTTTGGTCTAGTTGCTATTTCAT 

ATCTCTGTTCATTATCTCTATCCTTTTCTTAGAAACATCATTCTCTCTCTTTCTCTCTAACATTCCTCTC 

TCTCATATTCTCTACAATTGTCTAGATAGATTTTTTATAGTCCTTATTGTTTTTTATTTCTCTAACTATA 

TGTATCATTTTTTATTCTTTTACATATATCTTTACTCTTCTTTCTCTTTTTATTTTTTTTGGATATAATA 

AATAAATATACATTTGCCGTGTTATATTCAAAGATGGATTGATATTGGAATTGGAATTGAAATTGGTGTT 

GCAAAAAAAATAGCAACCAAAAAAAATGACAACATCAACAACAACCACGAATAGGAGAAAAAATAAAAAA 

AGAAAGGGAAAGAAAGAAAGGAAAACAATAGAGGTGGTTTGATTACATAAGCAACCAAAATTTCTCGCGT 

CTTTCGCTCTGTTTGTTTTTCTGCCTTTGAAAGGGATGACAGCAGCAGAAAAGCAAGAAGAAAAAAAACA 

ACACCTACAATTCTTCATTTGTTTTGAGTTGGCCCTACATTCAAAGATCCAATTTAGCAGTCATCAAGAA 

TAATTTACAATCGATCGACCTCAGTCATCACCAAATAGTCAAACCAATTATTAA 

>retrotransposon_25 104 7bp public: 1..1047; zeta-like LTR: 314.. 822 

TAATAATTGATTGGGTTTTTGGGAAATCACCAATTGTCTACAAATCTATCCATATATAACTTAACACTAA 

GGTTAACCTTGATCAAGAAGAAGGGAGTGGGGGGGGGGGTGCATTTATCCTTTATCTTGGCTATTGTGGC 

GATGCATAATTCGTAATATAACGTAATTAATGAGCAATTAAATAAATAAATTGATCTGATACAACAAAAT 

T^AAAAGAAGAAATTTAATTAATACTGTGGCACGTGACAGTTGATTCTAGATCAATTCATAGTCCGCGTCC 

CCGAACCGAACAAAAACAGGGCAAAATGATTACTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTGT 

GGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTTAATACAGTTATT 

GCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAACCAACT 

AACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTGTGTTAGAAAG 

AGAAAGGGTGGATTATAAATATGTGTAAAAATCCCCTTTAGAGACTAATCACTAGAAATCTATTGATGGT 

TTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGAAAGCACTA 

CAGTATAGTATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAG 

ATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACAACACCACTCAGATTTAGCCCC 

TCTAAAATGCATATGGCACAATGATCTCACCTCGGTTGGTTAAACCTTTTTCTTCTTATTAAATCTATCT 

TAGTTGTAGGTTGGTCTCCCCCCCCTAACTAGTTTTACAATTCAATTATTAAACCAATTGTCAATTCTTG 

GTATTTTGTAAACAAGACTCATTAATAATCAATCGTCAATGCATATGATCAAAACAAATAGAAACTT 

>retrotransposon_26 7929bp Incyte: 1..7929; zeta-like LTR: 3346.. 3853 

AAGAGATTGTAGTGAAGAATTCAGCTCATTATTACTGTTTTGTCGTTGCTGGAAGGAGGAGGGATAATTC 

AATGCGCCACAACAGTGTTACTATGCATGTGGTTCTGACTGACTGATATTGTTTAAAAATTAACCAGCTC 

TCAAATAACAAAAGTTTAAATTTTCAAGGTTTGTAAACATGGCAGCTAGTAGTAGGATGGTTCATAATAT 

TAATTAATTATTAGTAATAATGGCTAAGTTTTTGAAGCATTGTTTTAAATTTTCAAATTGAAATTCAATT 



TCATTACAAATGGATTACTAACGGAATTCCTAAGCTCAACTGAATACCGTGATTGAAACATTTGAATTTG 

TATCTTTTAGATTAGCTATTTTTACTTTTTTTGTCATTGTAGTTGGTTATGATAATTACAAGAAACTAAA 

GTTTAATATTTTCATATTCATTTTCTTTTTTGGCCAACTTGCAAATAACACACAAACCCAAAATTAAATA 

ATTAGATTTAATGCATGCATAATTACACAGAATGTTTAGCCTTAACAAGTATTCTAGAAACAAGAAAGAA 

AAAATGTCGTCTTGGCGTTTATCTTAATTGTATTCTGTAAACTGGGTTAATTCTTATTTCCAACTTTTCA 

TTTTTTTGGATCTTGTATGGATTAAAAATTAAATATGGTATGTTTTAGGGTTGTATTAACAATACTTACA 

ATTATCAATCATACAGCTTTACTATTTTTATTTATCAGCAAATAGGGGAATTCAAGTTGCATGTGTTATT 

CAGTGGCAGTGAATCATAAAACAGCCAACTTGCAGCTTATTTCACTCCAGGAGCAATCATCACGGAATTC 

CGTTTCCCATCTCATTTTCATACTCTGTGGATTATGTATAGAGGCTATTTACAATATCACCAAGCAGTAA 

AACATTCTCTCCTCAAAATAACAATAAGATTAGTCAAGATGAACGACTTGAATCTATTCATATGCATTAC 

ACATTTAGTTTCTATTACAAATAGTGATGCAATGGTGCAAGATTACGTCTTGTCTGCACTAACTATTTGT 

AACGATGATTATGTGATCAAGAATTGGAATTCTTATTATATTCAGTCGTGAGTGTAAGCTATTTCGTTAG 

GGTTATCTTAACTCGAAGTTAAAGTTCCAAAACTATTCCATTTGGAGTTTCTGTTGTTGAGAAATACAAA 

ATACTCTTCTTGGTGGGGAGGAAATC CATTAATGATTATAAAATGAAACTCTTGGTAAC CTAATTGAAAC 

ACCACATTCAGTACATTTTCAACCGTCACTATTATTATTGTGGCAAATGGATTAAACAATAGACCTAACT 

TAATCTAATGGAAATTTTAAATCCATGAAAGGGGTGAAAATTTGAAATCAAAATAACTATCTGAACTGAA 

ATACCCCATGGATCTGATATCTTATACAATCTATCAACTAAACAGGGAAGAQTACCTGGAATTCCAAATG 

ACAATTCCTATTATAATTATTTAAACAGACTATGCCGTATTGTTTGTGACATTCATTGTTTTCCACAACT 

CTAATGTCAAATTTTTGTTATTGTCATGTAATCCCGGTGTTTCTTTTTTCTTTTCGGTGTTGCGTTCCAT 

GATATTTTGTTATCTCTTGTTTAGATTGAGATAAAGAATTGGTTAGCAGTGTAGCCATTTATGAGTGGTT 

TGTAAAAACAAGAATTACAAGGTTTGAATGAATTCCAGGCAGGCAGTATTATAAAACCTCGAAATAACTA 

ATCAAACCATCAGAAAAGAAAGCTTACTATGATGTACTGCTTAATCTCATATCTATCTTACAAACTTAAT 

TCACTGATTGTGGCTTGTCCGTGAATAATTCGGAAACCTTGTCTTTTTCGGTCCAGTAGGGGGTGCCATA 

GTCTTGGGTGGTGACAAAAAAAAAAAAAATTATAGTTGGGGTGGTGGGGTGTACGTCTGAGTAAGTCAGG 

GGAATGAACTCAAGACAAAAATAGAAGTTCTAAACATGGTACGTTCTGCTAAGTAATATCATCGATCTAT 

CTATTTTGCTCTAAATTTTCATAAGCAAATCCAGAACTTCCTCGTCAGTTTCAATTTCAAGCATACGAAG 

GGATAGTGATTAAATTATATTTTGAACCTTCTATTACTGATTAAGTGTTCCTATTAGTCTACGGATTAGA 

CGGTTAGAATGGGATTTTCAAAAGCACAAAGGTCAAGACTTATAGGAAATTCATAGAAAAAACACTCTGA 

AGTACTCGATGGTTGGATATATAATAGTTTTGCTAATTTAAACTCTTGCTGTTCGGCTAAGCTATTGTAC 

CCAAATGCGGTACTCCGATAGTCTTATAAATAATACTTGGCAAAAGTTCAATAAATATATGTCAATGGTA 

TTGCTTTCCAATTACCATTGACGAGGTTGTAAATTAATTCATACTTAGGTGACATCGATTAATTTAACAA 

ATATGTCTGTTTCAACGCTTACATCATCAGTCTTGCAGGAAAAATGTTATTGCCACGACACCTCAAATTA 

GCCCAACCCCTTCGTCTACC^AAACAATGTCAAAAACCCACTTAAAAGAAGTCGGACAAACCTGAACCCG 

GTATTTTATAAAGTAGTTTTGTGAATAATATCAGTACATCGATTACACTTTCCGTCTCAAGACTGGAAGT 

TGCAAAGCCATGACAATTGCTCAACCAAATGTGAATTTTTAGGTTCCATAGTCTTGATCGGGTAATGTAA 

ACACTTTAACTTTTAGTAAATGATACCACCAAGAAGAAAGCACTATTTTAAGCTTTATTTAACACTATAC 

ATTGGAAAATAAAAAAGTGGCTATGAGAATTAAACAAGATGACCGAGTAATTAAAATAGTGCTGTCGGTG 

TTAAGCAATACCGCTAGGGTTCAATCAATTAAGTGCTGCTTTTTTTTGTCGTTGTATTTCCATTCCTCCA 

CTCCTTTCTTTACTCTTGCAATCTAACATATTTTTTTTAAAAAGAAAACATATTGATACTTACATGTGGT 

AACTATTGTCTGATTCATCAATTCCGCTCTTCAATCTCGGTGTTCGGATAATTTCGATGAAATTATAATT 

ACCTGCCGCAATTCTAGAAATTCCTTTTTTTTCTTTTCTTTTTCTCGGAGTTGGTTCCAATACAAAGATT 

GAATTGAATTAGGTGAGAAGAAGAAGAGTCTTAACACCAGATGTATTACAGCTTTAAACTTTGTTTCTAA 

TTTGACCACAAAAAGTTGTCTGGACGCCTCAGTTTGAAATTAGTTTTGGGAGATTTCTGTTTTCTCATTG 

GCCTTACTCTATGGAAGTTTTTATACAAGAGCTTCCTTCTAAAATTAACTCTTTGTGTTGTAATATAGCT 

AATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATAT 

AGTTATTGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTT 

AATTGAATAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATA 

AAACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAAC 

CACTAGAAATCTATTGATGGTTTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTG 

CTAGTATATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTACTAATAATACAG 

GAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACA 

ACAACAGCCTAGTCTTCTTGACACTAAAAAAAAAAGAGATAAAAAACAATTTCAGCCAATCACATGTACT 

ACATTTGTAATAGATTTTATTACTTCAGCTGCTTATTACACAAACAAGGTTGAATTGATATTGTGTAGAG 

TAAATTTTCGGAAATAGTTTGAATTGGGTGATCATTTTCTTTATTTTTTTTTATGTCTTGTTTCTGTGAA 



GATCGGAATGCCAGGGTGGAGCTCGTGAATTGCACCACTAATTGCAGCAGCACCATATTTCAAATAAAGT 

TTCTCATGTTGTAATAGGATTGCTTGTCTCCATGAAACCAATCACTTAACTAAGCCCCAGGCTGATTAGT 

GTGTTTTCAAACAGTTTTGTACTAGAGAAACTCAGACCTTCTCAGGGCAAGTAATAACCTAAAAAAATGC 

CACAAAACTAAATGCAATTTCAGTTTGATATGATAGGCAATGACATCAACACCTGGAAAAAAAAAAAACT 

TTCAGGTGATGAAACGATTAAGGATTAAAGTTTGCAACGAAAAACAAGTGGAACTAAACTTTGCCTTATT 

GTTTTGTTCCGCTTACCTAATGATGTTTACTCCTTAGAACAAACAACATCAACTACTTTTAATCCTGACG 

ACGAAGAAGAAGACCAAAAAGAATAATTAGCCGCAGCTACGGTGGTGGCACTAGTAGTAGTGCTAGTGCT 

TGTTGTGTCTCATCCAAGAGAAATGGAAAAACTGCAAAAATGCCGCAACTTTGAACATTTTGGAACACAA 

TACAACTTTTTTTTTCCTTTTGGATTTACGATTAGCGCGATAGACGTGACCATAAAAATACCACACGATG 

TGTAGATCCTCTAAAAATAATGTACACATTTCCAGGCTTTTGTTTACTGCTTAATAATTTGTCATCATCG 

GTAACAATGATAGTCTCCCCACCCTAACTACAGTAGACGGAATTAGACACCAAAGATCTTATAAATCAAC 

CCCAAATTTTCCCATTTTGATTTTTGATTTTTTCGTATTCCTTGTTGTTTCCATAATTTTTTAGTTACTC 

CTCCTCAACTAAACTAGATAACTCGTCACAGTTAACAACAGAAAGGTATGTTAAATATTTATTTCGTTCT 

AAATTCAAGTTTGGTATAGAATATTGCAAACAACAACAATTTGAAAAATGGACTTTAATTTGTTCTACAA 

AATGCAAACACATCTAGAATTAATATTTGCTCTGGAAACCGTATACGGAAGTTATGGATAATCACGTTAT 

CCTGATATCTATTATTAACACCACCACAATATCTATTATTTCTTGTATGGATTGCGGTGCCAAGATCAAA 

GAATCATTTTAACCCGATATCTTACATTTCACCTCGATCTAAATGTGATTCAGTATCACCGCCTCATTGT 

TTCACCACTCAACCTCCCCATACTGGCAGTACATATTTTTTTTTTCATTTTAGAGAGTTTTAACATAACT 

TATCGGCATTTTCAATAATGTTTATTTGGAAATTTAGTATATACCGATAAATCCTGAATTCTCGTATTGG 

CGATGGATTTACCAAAAAAATGGGGAATGAGTGTACACCAAGAAAAAAAAGAAAAATTCAAGAAAAAGCG 

AGTGACTAAAAATGTCGTGGGAATTTAATTTATCCTGGAAAGATGCCCCGATTCAGAAGTAATGTCGAGT 

ACTTTCACCCACATACAATGAACGACTTTTATTTATTCCTTCACCCCACACAGCAACAACTACATTTAAA 

TTTCAGTATTTAAGCGACCATGAATTTAAATTACAATACTCCACAGATTAAAGCATTTTGTTTATAACTT 

TTCTATTCTTATCAATTTTTTTTGGTATAGTTGTGGTTTGCGTCACGGTTGTTTTCTTTTTTTCATTTTC 

CTTAGTTTACTCCACATACACATACACGTACATTTCTATATATACCCCATGATTC CC CCC CCATTTGATT 

TTTGTTGTTGTTGTTCAGCAATATCTACTTTATTTATTGGTTTTTATGTTTATATGATACTAACTTGTCT 

TTGTTTGCTTTAGTCATGAACTCCGATATACCACCTCCACCACCACCTCCAGAATATACCCAGTCCCATG 

AAGATTTACCAGCATACACTTCGTCGTTGAACTATTATGGATTATCATTGATTAAAACAGAATTCATAAC 

CCCATATCAATACAATAGCGGTAACCGTTCCTGGAAACCAGTATTGCTTGAATTGAACTCTACTCAATTG 

AAAATATACAACTTGAACATTGATAAGAAACTACAAGATTTGCTAATATGTTTATATTTTGAATTAAATT 

GTTTAGATCAATTAACTAAAGACATCAATTCTCATTATAAAAAGAGTAAAGGTTTTGACTTTAGTGAATT 

ATCGTCTAATGATGCCGACGATGTCGGCGATTTGTTTTCCGGTGATGCATATGGTGGTACTGATAGCTCC 

AAGTTATCTTTAAATGATTCCAAGTTTGGCAAATTGAAAAACAAATTGAGAAATCAAAAATCTAATAAAA 

CCTTGCAATCAATAAAAGCTCATTACGATGAATTAAAAGATAACAAATTTTTCTTTGAACCAACATCCTC 

AACAAAGGAATATAACCAATTCGCTAAAAAGTATAGAGGAAATTTGTTGCACTGTTATTCTTTGGCAAAC 

TTGCAGATTGGGGAAGCACCATCTTTGAACCAAATAATTTCAGCAATCTACAAGGAAGAGCATAATGGCA 

ACACCAACAATTCATCACTCGTCAAATACAAAAACACATTGCGTCTTCGAATTGAATATAAACAAATCTT 

ACTTCAATTTTGGTCTTTCTACGGTATGATCAGTTGGTTTAGGAATTTCACCATTGGAAGAGATTTGAGT 

GTACCCGTCGAAGCAAGACATGTATCGAAACTCAAATCTATACCCTCAAGAAACACTAGTCAAAACAATG 

CATTATTGGCCGCTACTGCCGCAGCTGCAAACTATGGAAGAAACAGAGCCAATACTCCAGTGGACGGTGT 

CGAAGAAGACATATCCATGTTTCGCTCCAACTATTTGACTATTAAAGATGAAGATAATACTCATTCTGAC 

ACCAGTAGTGAGAATTCATCTGTGTTCGACAATGAGAGAAGAGGGTCCATAGTTTCAACAACTACGTCAA 

TCGAACCAGTCGACTATGTTACTATTAACAATTACAAGTTTTATTCCCAAGAGTACACCTTTACCACTGT 

TGAGAAACAATACATTTCCAATTGCATACCAGATTTGAACTCTTTTGATAAATGGAATGGCAAGTTAATC 

ACCGTCAGTAACGTGGATCATTTTATTAGAGATAAGAGATCTTTTGAAGACAAAGATGACGTTTTCATTA 

GTTATGCTGCATTGGGGAACTTGGTACAATCATATGATAAAAAATCACATAACGACTCATCCATGCTTAC 

CACCCAAACTTTTATCATTCATCAAAAAGGGTTAGTTGGTTTAGGAACACAAGTTTGATTCTTAAAACAT 

ATATAGATTGATAGATACCATTTAATATTTCTAAACATATCTTTACGAATTAATAAATACGACTTTTAAT 

GATATAAGGTATTTTGGTTGTAATTGTAGATTTGGCAAAAAAAAAAAAAATAAACAACCATCGTAGTAGT 

TGTTGTTACAGTGGTTCAAGTTCACGCCCTAAATTCTTGTGGCTGTCTCGCCTTTAACTTTCTTTCTTCC 

TCCCTTAACTTAACATGTACGTGTACTTAATATTATTTTGAAAAATTTTTTTTTTCTGTCTGTTTCTCTC 

TCTCCTTTGTTCCCAACACCAGTTGGTACTTTTAATTCTATTTTATTTTTACGTTGATCTGATATTTATT 

TATATATTTATATATTTCCATCAATTCTAAAACTTAATTACTTCAAAGACCAAGTTCTTGAATCTTCTTT 

TGTTTTTGCTTGTTTGTATACCAAAACACTCTTTTTCAATTATTTCCCTGCTGTTTTTCTTTAGAAAAGC 



ATTGTCCATTTGTCTATTAGTCTGTAACTGGAAATTTGTCCCGTCCTTAftATTATTTTTTTTTTGAAGAA 
TCTTTTCATTTGAATCATT 

>retrotransposon_27 2292bp Incyte: 1..2292; zeta-like LTR: 1327.. 1834 

GATATTAAGTCGTCTAATGCTATTTTTTATTTGAAAAAAAAAAAACAAGAAAACAAATGTATAAAGGTGG 

AAGGAAAATAAAAATTAAAAAAAAAAAAAACTCGAATATTAAAATGAAAGTGGACAATTAATTGATTGAT 

TAATAAATTGGTTTTATTAGTATTATGTAAGGGATTTCAAAGAAGTCATCTAAAAATTGTTAATGTAGAT 

GTAGATGTAGATGTGGTTGTTGTTCTATGTGTTTACAGAAATTGATCATCAAAGTCCAAGATTTTACATT 

GCCTCGCCAGTTCTATTTTTATAAATATTGGCTGTGTGTTTTGGGTGTGCTTGGGCCGGGCAGAGGGTGG 

GAGAGAGGCATGAATGCGGAAGAGGAAGGAGGTCATTCCATTCCATTCCATCGCCTCATTCTTCTCCATC 

GTTCATTCATTTAATTACGACAGCAGCAGAAGAAAAAAAAAAAGAATTCAGATGTAGATCACGTGCCAAT 

ATTATGAAATATTCCATTTTGGGAAAGTCAGCTTCAATGGCTTACATGGTAGCGCATACTCATAGATTTT 

AAAAAATCTGAATAATTTGTTAGTTCTCTATGAATGAATAAACAGATTACTGATAAGAACCAGATTAATT 

ACTTAGAGGTTTTCTTATTTTTTCTTTTTTGATAGCAAAAGTATTCATGAATTATTCGTATTCGTAAAAA 

ATTTAAGAAGGAGGGAGAACAACAACTGTTAACCCAAATGGTGTTTTTGTTAAAACTCTATCTACTAAAT 

TCAACATTTGTGAAGATAAAAGTGGTTCAAATTTTTTGTATGAAAAAACAACATAGATTTATATAGCAAC 

ATCACTACAGTAATATATCGAATACAATAAATATATATATATAATAAATTAAAATAAAAATAAAAATATA 

CATCTACAATATGAAAAAAATCATTTAACTATATAGTATGTCTAAATTATCGAATGAAAGTTAGTAATAC 

AAACTCCCATGTTTAGTGGGGAGCTTGGTAGAGCCTTCAAGGCAATTCATAGTAGGTTGGAGGAGGCCCT 

AATCAGAGGGTCTGAGTTGAACAAAAGCGCCCAAAGCTTTGTTTGATTCATTGGAATATACTCTCGGTTA 

TGTCGAAAGTATTGGAGCTGAAAATAGAAAAGAAAAAAGTGAATAATTATGATAATTATTGGTGTGATTT 

TGTCACCTTTTTATACCCAATTTTTTTTTATCAAGAGAGATTCTTAGATTTGCCATTTTGAGTGTTTCAA 

ATTTCCCATGTGGATTGAATTTTCAAAATTGGTTACATATATCCTTGAAAGTGTTCATAATTTTTGTGTT 

GTAATATAGCTAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAAC 

TACCTTAATATAGTTATTGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGT 

TAGGTTGAGTTAATTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCC 

TATTAAGGATAAAACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTT 

TAGAGACTAATCACTAGAAATCTATTGATGGTTTCATATATAGAGTTTAACGATTATATTTATAATATAA 

GTTGGTAGTTGCTAGTATATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAACTCTAC 

TAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGA 

GTTAACCCACAACATTTTGTAGTCGTAAACTTGAAATTCAAAGAGAAGGGGGGGAATTAAATTGGGTGCA 

ACGTGTTTGTCAAAAATTTGGTGTGAAAAAAATTAATTTAACACTCTGCATTGTACCATAGGGAATATAA 

TACCCAGAAATAAGAGAAATTATCACGTGAGACTAAAACTAAATATAATAAATTAATATCACAATTGAGA 

AAGACACTGAAACTAACTTCTTGGTGTATTAATTTTCAACACTTGATCACAAGTGCGGGGATTAATCATA 

ATTGCAAAGAGTGTGTTAGAAAGAGCGAAGGTGGATTATGAATATTGGAGAATCCTCTTTAGAGACTATC 

CGCTAACAAAATAGATGAACTTGCTCAACAGAAACAACTAATCGACTAACTGACTAAAATTAATATACTA 

AGTATAGATTT^AGTTATCACGTTAATATTCTATACTATCCATCTCCATCACT 

>retrotransposon_28 2025bp Incyte: l.*2025; zeta-like LTR: <794..1294 

TGGGGAGCAAATGTGAAATTAAAGAGTGTGGTGATATGTAATTTTTTTTCAAAAAAGATTGGATTGACGA 

AGCATTATATATTCGTCTAAAAACCATTTTTGCTGGTTCCGCT^ATAAATCTCGGAGATTATTTCTCGATT 

ACCAATTTATGTTGTTTTGTGACATTTCTTATATTTTGTTCTATTTTACACGACTATTTATTGTTAATAA 

ATATGTCACCTAAAGAATATTTCTATTTAGTTTTACATATGTTTTTTGACGACAATCAACTATTACAAAT 

TAACCTACATTTTTTAATTTGAATATATACAATTTATATTGAATTAACATTACCATTTAGTTTTTGATAA 

GAATAGATTGCGCTATTTCAAACATTTGTTAAATTATTTATTGTGAAACAACTATGTAGAATAAAAGTAT 

GAACAAATTCTACGTTCATCATGTGGGGTGTGCCTTCATATATATCTTTGGATGAGAATGCCAAGAAAAA 

TGATGGCGTGACAATTCAATACGGCAAAACAAACTAATCCCCTCTAAGATTTTACTAGTGTGTTTCCCTA 

TCGTCTGAGGAAAAGGTAACAAAACATCGTTTAACCAATTGGTGTTTGTTACGATGGTGACGTTGAGTAC 

TGCATATAGTTGCAACGGCAAATTGCATCCAGCGAGTTAACAGCGAATGGCAAAGTGAAGCCTCCGACTT 

GTGTTCATTGACTACTGGGATTGGACTGGGAATAACGACTTAACTAATTAATGTTCTCGTGGACTCGTTT 

AGCTAGAACTAACATTTGTTATAATATAGCTAATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGT 

TATATTGCGCACAGGTTAACTACCTTAATATAGTTATTGTTAATACAGTTATTGCTGTTGACTACTATTG 

TTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAACCAACTAACTACCGTATTAAATT 

ATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATA 

AATATGTGTAAAATCCCCTTTAGAGACTAACCACTAGAAATCTATTGATGGTTTCATATATAGAGATTAA 

CGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGAAAGCACTACAGTATAGTATGTCAGAA 



TCAGATTATTTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAA 

TGGCAGATCAAACTCAAGGAGCTAACCCACAACAGCATTGATTATATAATCATCTATGTAGCCAATATAC 

ACTACCGTCCAAACTCCCACTACACACTTGTAACAGTGTTTTACAAATCTATGAACGAATAACCGATTCA 

AATGACACAATAAAGAACATTTCACCGATTTGAATTGCTAATCGGTACTATAATATTGATGGAAGGTTAA 

GAGTTTAATGCTACCCTAGGTTTACCGGAGATCAACAGTTGCATATACAAAACGTGTTATCTGTCTACGA 

ATGGCTTTCTATGTGTATAAAATGTTTCATCAATTGATAATTAATTATTAATCTGCTTACTGAGGTAAAC 

CCCTTTTAATGCAATAGCAAATATGAGGTATTTTTTTGCTATTGACATGCGTATATGAATCCATTTGTAT 

CAAATTGCCGATATAATGAAATGGAAATTAAGGGAAAAAAAAAAGTTTATATCCAAATTCATGCGATTAA 

CAGGTTCTTGTGATTATAATTGGTAACCCCCTCCCCCCTAAAACTCATATCTGCCAAAAGAGGAGGATAT 

TTGAATATGCTATTATGAACCCCATTGATTTTGACTACAATTGGATTTGTCGGGTATTGAAACCCAAACA 

TATTATAATTTGCTATGCGTTTAAATCAACCGTTTACTGGTAGATCCTATACTATAAATACAGCCAACAA 

TCCCCAATTGTTCAGATAAAGTAACACTCAATATCATTTGATCAATCAATCAAGAGGATTACAAA 

>retrotransposon_29 2731bp public: 1..2731; zeta-like LTR: 380.. 887 

ACATATTTTTTTTTAAAAAGAAAACATATTGATACTTACATGTGGTACTATTGTCTGATTCATCAATTCC 

GCTCTTCAATCTCGGTGTTCGGATAATTTCGATGAAATTATAATTACCTGCCGCAATTCTAGAAATTCCT 

TTTTTTCTTTTCTTTTTCTCGGAGTTGGTTACAATACAAAGATTGAATTGAATTAGGTGAGAAGAAGAAG 

AGTCTTAACACCAGATGTATTACAGCTTTAAACTTTGTTTCTAATTTGACCACAAAAAGTTGTCTGCACG 

CCTCAGTTTGAAATTAGTTTTGGGAGATTTCTGTTTTCTCATTGGCCTTACTCTATGGAAGTTTTTATAC 

AAGAGCTTCCTTCTAAAATTAACTCTTTGTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTGTGGAA 

AGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTTAATACAGTTATTGCTG 

TTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGAATAGTGAAAACCAACTAACT 

AC CGTATTAAATTATTGTATTAAGATTGATTC CTATTAAGGATAAAACAGAGAGTGTGTTAGAAAGAGAA 

AGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAACCACTAGAAATCTATTGATGGTTTCAT 

ATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGAAAGCACTACAGTA 

TAGTATGTCAGAATCAGATCAATTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAGATCAA 

GCCAGTACAATAATGGCAGATCAAACTCAAGGAGGTAACCCACAACAGGTTATGAGCCTCGCCCGCTTAT 

TGAATTTAGATAATATAGGGGCAATGAAAGCTTTTGAAAGTGTTGATTTTCCTGAATCATTAAAACTAGA 

ATCCAAGATTAATTTTCAAGTGTGGAGAAATGAAATCCTTAGATATGCACGTGGTATTGGTGCTGAGTTT 

GAAAACTTTGTATTGAATGAAACTCCAGCTCACCTGTATGATCTTAGATTGGGAAATATGCTTCATCAAT 

TATTGATTCGCACTGTGAAAGAAAAAGTTAGAATGCCTAGGCAAGAACTTGGAAAATCAGGAAAAGAACT 

TTATCTTGATCTTATTAAATCATTCGGTACTCAATACCCATACGATAAATTTGAGATAGTTAAATACTAT 

TGGGATCAGTTAACAAACCCTTTAATTAATGTGAAGAGACGTTTTGAAATTGAAGAAGTATGGGTTCAAT 

ACATTAATGCTCAAACTGCAACAGAGAGAGAAGTTCTTAATTCATTTGTTTGGTTACATTTGTCAAAATC 

TATATTACCACAAGAGTACCTTAGAAGTGCCCATCCAGTTCTTGATAAAAATGTGATTAAAATATTTCTT 

GATACCCATCCAAAATGTGATATTGATCAAATTATGTCATTTGTAAATAATGAACTGATTAATTATGTAG 

GGAAAAATGATACAAGGGAAAATGATATGGGACAGAATTTAAGAGAGAGTGATTTAAGAGAGAGTGACTT 

AAGTGAAAATGATATACAACAAAATGAGTTAAGCGAAAGCGATTCAAGTGAAAATGATTTAAGAGAAATA 

GCAACAAAAGAAACTGTTAGTGAACTTTTTGAAAATCAATGTCAGAATTGTTTTGGACTTGGTCATGATT 

CATATGAATGTTCACTGGCATTTAGAAACAATCAGTATATTCCAGATTTATTTTCTAGACTTCAGAGTTT 

TCGTGGAAATAGAATTCAAAATAATAATAGAAATGTCTGGTCTAGATTCTCAGAACAAGATGAGTCAATT 

GCAAATACAGAAAAAGGTAACTAGATCTAATGATAAAAATGAAAATCAGTGGCAGTCAAAACAATTTACA 

TATTAAACAAGTTTGAATGTAAGTTGTTGTTGTTTAGATAAACTATGTCATGGTATCCAAAGTTTTATTT 

TATATTTATTATTTAAGTGGTCATGTTTATTTACTTATAATTGTTATTTAGTTTTTCAAGTGTGAATTTT 

ACTTACTTATAATTGTATTTAGTTTTCAAGTGTGAATTTTACTTACTTATAATTGTCATTTATTGTTCAA 

GTGTTATTTTTACTTACTTATAATTGTTATTTAGTTTTCAAGTGTGAATTTTACTTACTTATAATTGTTA 

TTTAGTTTTCAAGTGTTATCTTTACTTACTTATAATTGTCATTTATTGTTCAAGTGTTATTTTTTACTTA 

CTTATAATTGTTATTTATGTGTCCAAGTTTTAATATTATTTACTTATAATTGTTATTTATTGTATATGTG 

TTAATTTAATTCAATTGTTAATTGTTATTTATTGTTCAAGTTTTAATTTTATTTACTTATAATTGTTATT 

TATTGTTTATGTGTTAATTTAATTTAATTTAATTGTTATTTTTACTATTTAAATGTTGATTTTATTTATT 

TAATGTTAACTTGTCATTTTTAATTTTACTTATTATATTTTACGTGTGACTATTATCTATGATAAAACAC 

TAATAGTGGATATTGAGTGTTTATTTGTTTCATCGCAGAGGATATTTATTGGAGGAGGGAGAAAATGTCT 

ATTTGGTATAAGGAAGACCATAAAAGTTGGTTCCAAATAGTCAACCAACCAATAAACATTCCCTCATGCT 

T 



>retrotransposon_30 2858bp Incyte: 1..2858; zeta-like LTR: 814.. 1321, 

reverse transcriptase fragment (contains stop codon) : 635..>537 

CCTCCGGGCGTCTATTTACAAGCTGCTTTATTATTTGTTATTACCTGGGTGTAAAAGCCCTCTTGCATTT 

GAGCTATTTCTATTCCCACTTCGGTATTTTTTTTACAGCCTCGTTAGACGAGTTCTTGATATTACTAAAT 

TAGTTGTTTACTGAGTGGCCTGATGGTTCCTCGTCACTCTAGTTTTTGGTCTATATAAGGGTCAGAAATT 

TCCCTTCTCCTTAGGTCCATCAAGTCAAGATATACATTAGTTGGTAGCATCGTATGGAATTTTCGTATGA 

ACGGCATACCAAGTATTAATTTCCGATCGAAATTTTTTAGGACGTCTTGATAATCAGGACAAACATCATG 

AAAGGTCTATACGACGAAAGTTTACTTTACACAAGGGGAGACCATATGTCTTCTTTATTAACAACTAGTT 

ATATAGCGAACAAATAAGTTTATACAGAAATATATGTACACAAACAAAGTTATTGTTTATTAATTATTTA 

ATTAGCTCGGAAGAATAACTCTGTGATACTGCATACATTCAAACAAAATCAATCTAGTTTCCAACATCTT 

TTTCACTTGGTAATGTAATTATTCTTGTTCTGGCACCGACAATGGGTATTGTTTTGTAGCTGGAGGACTA 

ATATGGGGTACCACCTCAATTTTTGGATCCCAGCTCCCACGCAGGGGTGGCTTCTGATCTAACTCACTTT 

CGAAAATATCCTGATAGTTTC CAATTAATTCAGCAAAATAGCTCTTGTTTGTACC CTTAACC AATGACAT 

GATATCCTTTTTATTATCACCGATACCACCTGTGTCTTCGTCTTGTTGTAATATAGCTAATGCTAATTCT 

TGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTTAA 

TACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTG 

AAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTG 

TGTTAGAAAGAGAAAGGGTGGATTATAAATACGTGTAAAATCCCCTTTAGAGACTAACCACTAGAAATCT 

ATTGATGGTTTCATAGATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTG 

AAAGCACTACAGTATAGTATGTCAGAATCAGATCATTTAAATTCTACTAATAATACAGGAAACACTTTCA 

TTAGTCTAGATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGCTAACCCACAACACGTCTTCTT 

CAGTATTAGGGAACAACATACTAACTTGACCTTTTCTAGCTTCAACCAAAAATTCCTCTATATCCATTAA 

TGGAATTTCATCAAACTGAGCAGCCCCAAAAAACGTTTTGCTTCCAAAGTCTAAATGAGCATGGAATTTC 

CTTATGAAAGGTATACCAAGTATTAATTTCTTATGGAAGCTGTCCACTACAGCAAAATTCTCTTGGAATG 

TAATACCATTAAACTGGAACTTGAGGTTAATTATTTGGTTAAAGTTTCTGTTGATTTTTGGTCCAATAAA 

GTACCCAAACTACTAGAGCTCCAACAACATTTTCAGAAAATGGCCAATAATACAATAAGTGGGTATATTT 

TATCAAAAGAGTTTATATTATGGTTACTCGACGGTATTATTCTCTGTTGATTTAAGGCATTCTGGTCGAC 

CAGTGGACAAAATTCAAGAGTAGTGTTTGTTTAGACTTTACAGGACATGATAGTATATATAACAAAAATG 

AAATACATTAATCAAAACTAACTAAATCCTAAATTAATGCCAATTTCTATTGAATTGGTTTGCTACTTTG 

TAAAATTTGTGAGTAATCTTAAGTACTTATATGGAAATCAACAATGGCAAAAATACAAGAGAATGACCCC 

ATGACACATTCAGTGCACAATTCATAGTAACTGCTTGGTCACTTGCACATGACTCTGCTAGTATACTCAA 

CCACTCTTGTGACTTCCATATAGATACTCTCGATGAAATGTCTCAAATTAGAGGACAAACAATCTGCTAT 

AATCTTGGCTAATCACCCATGTAACATGGAGGAACCAAACACATAGATATACGGTACCATTTCATACAGA 

ATTTATCACTAAAGAAATTAAGAAAAACTTGTGTTATCAAAGTGGTTTGCGAACTTTGTAGTAAGGGAGA 

GTGTTGAGAATTAGAGATTCTAAGTTCCAGAAAAATATCTATATTTATATATATATAGGTAGTGCAACAC 

TACATAAAAGGGACTGATTTGAATGTATGTATGTCAAATGACACCCTTATAATGTTGAGTGACATCATAT 

CAAAATGGAAATCTACTGTATCAATTAAGAGATTACTAAAAGCAATATACTTAATATGAGGTCGTACTTT 

AAGATTGTGAATAGTATCAGTAGCGAGTGGCTATGTGTTGTGATGGAGCATCACTGGTAGTTTCTTAGAT 

GTAAATCTCAGTGACTATAAGCATACTAAATTAGTTATGAAGATATGTTCCATTAAAGTATTTAAAAAAT 

AATAGACAGGCTATCAATTTCTAATAGATTTACCGTCCAGATTATAAAAAAATTATCGAGATACATATTA 

CACCGATTGAATTAATAATATGTCTACTACAAACCCATCACGGAACTTGATGCAATTGATTGAATAAGTG 

TCTCTCTAACGATGACATGTCCAATTCTAATCAAAATAATTATTATTCTAATTGTAATATCTGGTATTTA 

ATTATTTATAATTCACGAAACAGTTTGATTGGTTTCTGATTCTTCTGACAAAAATAAG 

>retrotransposon_31 1636bp Incyte: 1..1636; zeta-like LTR: <595..1098 

ATGTTTATTTAATAATTAAACCCCAGTTGACCAACTATGAAATAGTATAATGATAAATGCAAAATAAATA 

TAGTATGAACAATATGATAGTTTTAGTGTGAATTTTGAATAAGAAAAAGAAGGGATAAGGATATTTTTAC 

TAGGAAACTCAATTATAATTACTAATGATAAAAACTCCATCAGCTACTATTATTACTCAAATTTTAAATC 

ATTTGTTTATCACCTACACAAACAGGGATTGTCCAATATTGATTACTAAAATTAGAACAAATAAGAGAAT 

ATAATTGAAGTTAAATAATTCTTTTACTAAATCTATTGACCAAGAACTACATCAAGGGAAAGTGTTGCAT 

ATACATCTAATGTTTATTCTTGGTTAGAGTATTGATACAAAATTATATCATCACCAACGAATCACATTAA 

GGGAAAGTGTTGTGCATATACCTGATGCTTAGTCTTGGTTAAAGTATTTGTGTGAAAGGTTATCGTGACC 

AAAGATTATAGTAAGGGAAAGTATTATGAATAAATCCAATGTCTACTTTTACAGAAGTATTGACATGAGA 

GATTATAACTATCAAGAATTGCATTAAGGGAAAGTGTTGTAATATAGCTAATGCTAATTCTTGATTAGTG 

TGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATAGTTATTGTTAATACAGTTAT 



TGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTAATTGATTAGTGAAAACCAAC 
TAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAAAACAGAGAGTGTGTTAGAAA 
GAGAAAGGGTGGATTATAAATATGTGTAAAATCCCCTTTAGAGACTAACCACTAGAAATCTATTGATGGT 
TTCATATATAGAGATTAACGATTATATTTATAATATAAGTTGGTAGTTGCTAGTATATTTGAAAGCACTA 
CAGTATAGTATGTCAGAATCAGATCAATTAAACTCTACTAATAATACAGGAAACACTTTCATTAGTCTAG 
ATCAAGCCAGTACAATAATGGCAGATCAAACTCAAGGAGGTAACCCACTACAGGTTATGAGCCTCGCCCG 
CTTATTGAATTTAGATAATATAGGGGCAATGAAAGCTTTTGAAAGTGTTGATTTTCCTGAATCATTAAAA 
CTAGAATCCAAGATTAATTTTCAAGTGTGGAGAAATGAAATCCTTAGATATGCACGTGGTATTGGTGCTG 
AGTTTGAAAACTTTGTATTGAATGAAACTCCAGCTCACCTGTATGATCTTAGATTGGGAAATATGCTTCA 
TCAATTATTGATTCGCACTGTGAAAGAAAAAGTTAGAATGCCTAGGCAAGAACTTGGAAAATCAGGAAAA 
GAACTTTATCTTGATCTTATTAAATCATTCGGTACTCAATACCCATACGATAAATTTGAGATAGTTAAAT 
ACTATTGGGATCAGTTAACAAAC C CTTTAATTAATGTGAAGAGACGTTTTGAAATTGAAGAAGTATGGGT 
TCAATACATTAATGCTCAAACTGCAACAGAGAGAGAAGTTCTTAATTCATTTGTTTGGTTACATTTGTCA 
AAATCTATATTACCACAAGAGTACCT 

>retrotransposon_32 2125bp Incyte : 1..2125; zeta-like LTR: 1105.. 1612 
TGAGTAGCCTTTTCTTGGGCGACTTTATTAGCTTCATCAACAAGACGTTTATCTTCAGCTTCCTTTTCCA 
TAATAATTCTCTTCCATTCTGGAATTGGTTTTGGTTTCTTTTTATTTATCTCCTCTTCTTTCATAGCCAA 
CAAAAGAGTACCCAATAATAATATAATGGTGATACCTTGTGCGTACATTCTTGCTTGAACAGCTTTTTGT 
GCGGTATC CATAATTTTGTCTCTGTTAACCAATACCCAAGAACCATATAAGGAAC CAGCC CAAGC ACTTA 
TGATAATTTTATATTTATTGTCATTCAATACGGTGAAACATTTGTCACTAAGCGATAATCTGTTCCATTC 
ACGGTATTCTTCCAAATATTTAGCTTCCTGATACTCCGATTGATGCATCTTTCTATCGAATTCAACAGAA 
p CCTTGATCAGCGAAAAAGGCAGCCACAGAAATTGTTGGCATAGCAATTATGGCTGCTTTGATACTTGGAT 
v3 TGAATGTTGCAAATCTTGCTGGATGTCTATGCTTTAAATATTGGTACAAACCGACTGAAAGTGCACCACC 
ATAAAACAACCCTTTGGCACCTTCTGAAATAATATGTGAAATGTGAGCGTCTTTTTCTTCTTTGGATAAG 
hj ATCTTCATTGTGGAATTAAGATGACTTTGTGATTAAATTGTTGACTTCTTTAAGCCTTTTAATGTGGAGG 
H AAAAAGAAAAATCTATAATTAAAAAAAAAAAAGATAAAGCAGATAATTCTTTGATCTTTATATACTTGGT 
m CTATATGTAGTAGGGGAAAGTCGGAGTCGGAATTTGAAAAAAAAAGAGAAAAAAGAACGAATATTTAGAC 
^ TGTAAAATTCAAACCCCTGCTGATTAGTATATAAAAAAAATGAGTTCATTTTTCCTTTCTTTTTTTTTTT 
?5 TTCGCGCGGATAGCAACGGTCATTAAGTTAACGAGATAAAAAAGAAACAACCAGATAATTATGAAAAGTT 
^ GTGATGGTGTCACGTGCGAACATGAGAGTCATGAATTTTGACGAAAACGTCAAGCTTCAGTTTACAAAAG 
* s ACCTCTTTATTAAAATCGAATTGCTTATAGGGTCGTCGATGATGAGAAGGTGTATGTTGTAATATAGCTA 
!l ATGCTAATTCTTGATTAGTGTGGAAAGCCTAATAAGGTTATATTGTGCACAGGTTAACTACCTTAATATA 
lf s GTTATTGTTAATACAGTTATTGCTGTTGACTACTATTGTTATTGTTAAATTAAAGTGTTAGGTTGAGTTA 
IU ATTGATTAGTGAAAACCAACTAACTACCGTATTAAATTATTGTATTAAGATTGATTCCTATTAAGGATAA 
*D AACAGAGAGTGTGTTAGAAAGAGAAAGGGTGGATTATAAATATGTGTAAAATC CCCTTTAGAGACTAAC C 

^0 ACTAGAAATCTATTGATGGTTTCATATATAGAGATTAAAGATTATATTCATAATATAAGTTGGTAGTTGC 
^0 TAGTATATTTGAAAGCACTACAGTATAGTATGTCAGAATCAGATCAATTAAACTCTACTAATAATACAGG 
AAACACTTTCATTAGTCTAGATCAAGCCAGTACAATAATAGCAGATCAAACTCAAGGAGGTAACCCACAA 
CATAGAATACGTTTTCAACTACTTAAGTATCCACTAACCTAAATTTTTTTTTTAATAAAATTTCATTGTA 
TTAGTCTTTCTTACTGCTTTTAATCAACTATAAGTATAGGTTTCCGTTTTTTTTGCAGTAAAATTTATCG 
TTCAGGAGAAATAACAAAATGTACACGACTTATTCGCAGCATTTTTTTTTTTGTTTTGGGTTTTTGTATC 
AAATTGTTACAACAACAACAACAACCTCAATTCTTAACCAAATCTACCCCTCCTATTTTTTTTTCTCATA 
CACACAATACATCTTACACTATCTTTTGATAGGCTTTATTGAAGAAGTATTTAAGGAGTGTAATGACAAT 
CTGCTTAACTCATATATATATATATAGATAGTAGTCAACAATAGCTTTATCTACTTTTTTTTTTTGGCGA 
CCCCTGCAACTTCAGGCCCACCAGTTTGCCCATTTTGGTGCCCCCATTGAGTAAACATGGGGATTTGGAG 
CACACTTTTTTTTAGGTAAAAATGG 

>retrotransposon_33 1292bp Incyte: 1..1292; san-like LTR: 369.. 749, CTA2 
(transcription factor) : join (974 .. >234 , <888 1292) 

CTAATCCAAAAATCCATAACCCAACTGCTCAACGGCGAAATCCAAAACTTCCATGCTATTCTAGACCAAA 
CAGTGTCGAAACTCAATGATGCAGAGTGGTGTCTCGGCGTTATGGTTGAAAAGAAAAAGAAACTTGACGA 
ATTGAAAGTCAAAGAAGAAGCGGCAAGAAAGAAGGAAGAAGGGGCAAAGAAAAAGGAAGAAGAGGCAAAG 
AAAAAGGCAGAGGAAGCGAAGAAGTGTTTTATTTTACTTTTCTGTCAAATTTGCACTACTTTTAATTTGT 
GTGCAAATATTCTATTTTACTTGATTTTTATATACTTTTATTTTACAATACTTTTTTATAGGACTTTTTA 
TATCTTTTCTTTATCAACTGTTCGCTATAGGGTAGGTCTTCCAAGCTAATTTTACCCGACACAAGATGAA 



ATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTTTCACTCAAGAAAATATTTTATCATCACTT 
TTTCTAGAAGGGAGGTTCAAGTGTTGGAGAATAGACAGCGAACACCTGATATTCCCAAGGTCGAATTAGA 
TTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAAATTATCTTTTTATATTTAAATTCT 
TAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACATTTAACATATATTAAGCACCGATTA 
CCTGTGACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGAACATCAACTCATCTTTTATACAAT 
ATATTCTTACGATTATAACTTTCAATTAAGAAATACAACTTCTTATTAGCATTCTCCTACAAGTTCTTAA 
GTTCCTAGGAATTTCTTCGAAACTATAATTAAAGACGAAAAGTGTAAAACAAACAGAAAGCAGAGGAGGC 
CCAGAAGAAGGCAGAGGAGGCCGTCCCACAAAAGTTTGACAACTTTGACGACTTTATTGGCTTTGACATC 
AACGACATGCAGAACGACGATACCATCGACGATACCATCGACGATACCATCGACGAAACCATCGATGAAA 
CCATCGACGATACCAACGACGAAGACATGTTGTCCAACATGGACTACGAAAATCTAGATCCGGACGAGAC 
CATCGACGAAGTACCTGCCACCACAGACAGCGACTTGGACATGAACAACATACTTGAAAACAACGAGCTG 
ATATTAGACGGGTTGAACATGACATTCCTCGACAATGGCT^ACAACACCAACCACGTAAACGAAGAGTTTG 
ATGTAGACGGCTTTTTAAACCAGTTTGGTAAT 

>retrotransposon_34 568bp Incyte: 1..568; san-like LTR: 113.. 493 

GATTGTATAGTGGTGTGGTTGATCGACTTCAATATAACAAGAGAGAGATGAGATGAGATGCTTTTATCGC 

GTATATATTTTTTTTTCCATTGACAATTCTGATTTCACAAATTGTTCGCTATAGGGTAGGTCTTCCAAGC 

TAATTTTACCCGACACAAGATGAAATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTTTCACT 

CAAGAAAATATTTTATCATCACTTTTTCTAGAATGGAGGTTCAAGTGTTGGAGAATAGACAGCGAACACC 

TGATATTCCCAAGGTCGAATTAGATTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAA 

ATTATCTTTTTATATTTAAATTCTTAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACAT 

TTAACATATATTAAGCACCGATTACCTGTGACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGA 

ACAGATTAGAAGCTTGGTAAATCTTTGGTTATTCATCACGTCTTGAGAATAATACAAAGTTTAATATAGT 

ATTTTCAA 

>retrotransposon_35 946bp public: 1..946; san-like LTR: 113.. 493, CTA2 
(transcription factor) C- terminus : <632..946 

GATTGTATAGTGGTGTGGTTGATCGACTTCAATATAACAAGAGAGAGATGAGATGAGATGCTTTTATCGC 
GTATATATTTTTTTTTCCATTGACAATTCTGATTTCACAAATTGTTCGCTATAGGGTAGGTCTTCCAAGC 
TAATTTTACCCGACACAAGATGAAATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTTTCACT 
CAAGAAAATATTTTATCATCACTTTTTCTAGAATGGAGGTTCAAGTGTTGGAGAATAGACAGCGAACACC 
TGATATTCCCAAGGTCGAATTAGATTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAA 
ATTATCTTTTTATATTTAAATTCTTAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACAT 
TTAACATATATTAAGCACCGATTACCTGTGACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGA 
ACATCAACTCATCTTTTATACAATATATTCTTACGATTATAACTTTCAATTAAGAAATACAACTTCTTAT 
TAGCATTCTCCTACAAGTTCTTAAGTTCCTAGGAAATTCTTCGAAACTATAATTAAAGACGAAAAGTGTA 
AAACAAACAGAAAGCAGAGGAGGCCAAGAAGAAAGCAGAGGAGGCCGCCCCACAAAAGTTTGACAACTTT 
GACGACTTTATTGGCTTTGACATCAACGACAATACCAACGACGAAGACATGTTGTCCAACATGGACTACG 
AGGACCTAAAATTGGACGACAAAGTACATGCCACCACAGACAACAACTTGGACATGAACAACATACTTGA 
AAACGACGAGCTGATACTAGACGGGTTGAACATGACATTGCTCGACAATGGCGACCACGCAAACGAAGAG 
TTTGATGTAGACAGCTTTTTAAACCAGTTTGGCAAT 

>retrotransposon_36 951bp Incyte: 1..951; san-like LTR: 389.. 769; POL 
protein: <1..321 

GATTTGAGAAATACCATTGAAGATCTAGAGTTAAAAATAAGGAATTTGCATGTACATGAGGATAATCAAG 
CGGTCATTACAATCTTAAAGAATGATAATTTCCACCCACATAGACCGATTGATATATGTTACAAATTTCT 
CAGACAAAAATTGAAAGATGGATTTTTTTCAATATCATATGTTGAATCTGGAGATAATTTAGCTGACTCA 
TTCACGAAAGCTTTAGGAAGAAATAAATTGATTGAACATACCAAAAGGATTAGAGAAAGAAAGGATTATG 
ATAATAATGCTACACTGATAGTGGACGTTAGGACGCTCGAAGAGATTAAGATAAACAAGAAATTGGTACA 
TCATTAATTAATTTAGCTGTTTACCTGAATCAGGGGAGTGTTCGCTATAGGGTAGGTCTTCCAAGCTAAT 
TTTACCCGACACAAGATGAAATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTTTCACTCAAG 
AAAATATTTTATCATCACTTTTTCTAGAATGGAGGTTCAAGTGTTGGAGAATAGACAGCGAACACCTGAT 
ATTCCCAAGGTCGAATTAGATTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAAATTA 
TCTTTTTATATTTAAATTCTTAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACATTTAA 
CATATATTAAGCACCGATTACCTGTGACATTCCGGAGTTTACTGTTTCGCGCACGCTGGCAGACGAACAC 
AAATGCTTGAACTATCTGCCGACTTTTTTTTATTTATGGCGTGAGACATTGTTCTCGCACACGGTTGTGA 
TTTATCTACCAGGCTCTCATATTTAGAGCGACAACTACTTTGAGCAAGCAAAACGCATATCTCACCACAC 



ACCAATTGTAGGCTATTCTCAACCGGAAAGTACAACTAGCA 
>retrotransposon_36 POL protein 107aa 

DLRNTIEDLELKIRHLHVHEDNQAVITILKNDNFHPHRPIDICYKFLRQKLKDGFFSISY^ 
FTKALGRNKLIEHTKRIRERKDYDNNATS IVDVRTLE 

>retrotransposon_37 9850bp public: 1..9850; san-like LTR: 369.. 769; CTA2 
terminus: l..>234, GAG protein: 939.. 1853, POL protein fragment 1: 
1896.. 2360, POL protein fragment 2: 2509.. 4893, POL protein fragment 3 
(reverse transcriptase): 4953.. 5723 

CTAATCCAAAAATCCATAACCCAACTGCTCAACGGCGAAATCCAAAACTTCCATGCTATTCTAGACCAAA 

CAGTGTCGAAACTCAATGATGCAGAGTGGTGTCTCGGCGTTATGGTTGAAAAGAAAAAGAAACTTGACGA 

ATTGAAAGTCAAAGAAGAAGCGGCAAGAAAGAAGGAAGAAGGGGCAAAGAAAAAGGAAGAAGAGGCAAAG 

AAAAAGGCAGAGGAAGCGAAGAAGTGTTTTATTTTACTTTTCTGTCAAATTTGCACTACTTTTAATTTGT 

GTGCAAATATTCTATTTTACTTGATTTTTATATACTTTTATTTTACAATACTTTTTTATAGGACTTTTTA 

TATCTTTTCTTTATCAACTGTTCGCTATAGGGTAGGTCTTCCAAGCTAATTTTACCCGACACAAGATGAA 

ATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTTTCACTCAAGAAAATATTTTATCATCACTT 

TTTCTAGAAGGGAGGTTCAAGTGTTGGAGAATAGACAGCGAACACCTGATATTCCCAAGGTCGAATTAGA 

TTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGTCAATAAATTATCTTTTTATATTTAAATTCT 

TAGTATTGTCATACCACGTAGATTGATACGGACATACTTAGCACATTTAACATATATTAAGCCCCGATTA 

CCTGTGACATTCCGGAGTTTCTTGTTTCGCGCACGCTGGCAGACGAACAGATTAGAAGCTTGGTAAATCT 

TTGGTTATTCATCACGTCTTGAGAATAATACAAAGTTTAATATAGTATTTTCAAATTTTGGAATACAAAA 

GTTGCTAATTGGTAAATAAGTTATTGATTTATTTCATAAATCTTTTTTGGTATCATATTTCAAAGAGTTG 

CAATTGAAAGCTAAAGACATCCTTATAAATGGCTGAATTTAGCGATGCTGAGCTCAGAAAGATGATGGGT 

ACACTTTCACTCTTGGTACAAGATTCCAGGAGAGAAATTAACCACTTGCATGATAAGTTGGAGAACAATA 

GTGACTCAAAATATCAATCTTTAGAAACGTACATCAACTCAAAGTATGCAGATACTATAAAATCATTTGA 

AAAATTAAAATATTTGGACATTGATAATTCAGAGTTGGTTAATACCTGGATCATGTGTTTTAATCAGGTT 

AAAAGGTTTCACCCTCAGGTTTTTGATGCTTTCATGGAGGCAGAGAACGAGGACGAAATTGGAATCGAAA 

AGATCCAATATACGCCATACACAGGTAAACACTTGAATGATATGATCAGAATCTTCTACATGAAGATATC 

CGAATTAATAGAAAGAAAAGTTAGTCCAAATGTTTCTAGAGAGATGAATGATGGACAGCCACAATTTGTT 

CCGAATTTGTTTAAAAAAGTTTACGAGATGATTATTTCAAAACCAGATGTTTCTGCTGCTGAAAGAATTG 

GAAAAGCTCTTTTCAAGTTACAATCTAAACTGAGAGAACTTGAAAGAGAATCAGCATTTTTGTTATGTCA 

ACATTTAATGACCAATGACCACCAGCACGATGATATTATTCTTAAATTTCTCGTTAGCGGTGTCTCACCA 

TGGTACTTACATCTGCAAATTTACATGCTGTCATATAAACTTGGATTCTCAAATTTGTTTTTAGAGATTT 

ATGCTCAACATTATGAATTGTATAAAGCAGATCCCATTTACAAATTGCCAGATAGTATGACATTGTTGAA 

TGAAATAAGATCAAATAGAGATTATCCTAAAGTGGTAAATGCTGCAAAAAATACAGTACAAGTCAATAAT 

GTTTCATCCAAGAACAATAAAAAGAAGGATGAATGACAACAATTAGCCAATAAAATTGAGGAAGTAGGAC 

GTTATAGCGAAATAAACGCAACATCTACATATCATGAAATTGGCGATACCAACAAAAACAAAGAACAATT 

AATATTGAATTTGAAAAATCATACAAAATTAAGTGAACAAAAGAAGAAAACAAACCTATTGGTATATGAT 

CTGGGAGCCACAGTATCCGTGGTGAATGATAAGACTTTACTTAACGACATTAAAGAATCAAATATCGAAA 

TTGCAACTGCTGAAGGGGAGACATCTACGGCTTATGCTTTAGGTACTCTAACCATATCTGTGAATGGATT 

GAATGCGAAATTAGATGGTGTTCTATACTTGCCATCTATTCAATTAAACTTAATATCTATAAAACAATTT 

GAAGATTTATGCTACGCAATTTTGATTTCCGAAAATCTAATGTGTCTAGTTCACAGTGACCACGGACCTA 

CGGTCATTGCGAAATATTCACCTAAAGATGACTTATACTCAGGCCCAAGATCGGGAACCTTTTTTTAAAA 

GAATTCATAATGACCAAACCCATTTTTTGCTTGCCNCTGCTAAAAAACTTTTAGAATCAGAGACCATATT 

TCTGGAGAATCCCTGAAAAATCCAATGGATTGATCAAGAAAAATTAGATCCGTTGAAAATGACCAATAAA 

GTAGAAAGAGTTACCTATGTCAGCATACGCAACATCAAACAAGAAGTGGCAGACAAATATATGATAAAAG 

ATCTTTACTACTATCATTTATTAATTAATCACCTTTCACATGAAAAACTACAATTATTAGTAAAAAGGGG 

AGTGATTAAACCAGTCAAATCTACTTCGGCTGAGTCGGCCATTTTAAATTGTCAGATATGTGTTGCAGCC 

CATGCAAAATTAGCTAGCCATAATCACACTCAACAACGGGAATTGGAGCGACCATTACAACGCCTCCATT 

TGGATACCGCCGGACCATTTACCTCAAATAAAACTAAGAGCTATCTTACAACCGTGATTGATCAATTTTC 

CAGATATACTGAAGTTATTGTATCTGACACCAAAGCAGTCAAACAAAGCATATTGCATAGACTTAGGGTC 

TGGAACAATAGATTTCAGTTTAAGATCGCGGAGATAAGATATGATAATGCATTGGAGTATCCATCGGCTG 

AGGAGTTAGAGGAGTTAGGAATTTATAAACACCTTCTCCCAAACTACTCTCCTATGCTTAACGGTACAGC 

TGAAGCAACCAACCGCCCCATTGTCCAAGGTATTTATAAGGTAGTGTTAAATTTTAGTTGTCAAGTATTA 

ATACTTTTCCCATTTATAGTGGAGTATGCGGTTCATATCCGGAATCATACACCTATAAAAGAATTTGATG 



GTGCTACTCCTTATGAACGTTACTATGGTTTATCTAAATACGTCATACCATTTTTTCAGTTTGGAACCGA 

CGTTTTGATAAAATGTGCTAGTGTACAAGAAGCTATTTCATTAAAACTACCATCTTCAAGAGATAAAGCT 

TTTCCTACAGTGATGTTTGGTGCTTTTCTCGGTTACGGCTCAGATTCCTTTACCTTCAGAGTTTTAGTTT 

CCACGAAAGGATATCCAGTTATTACAACATCAAACATCCGTCCAATAGCGACGATGCAAGTACTCAATGA 

CTATTTGGCATACATATCGGAGAATAGCTCAATAAGCTATGACGATACATTCTTATCACCTTTGAATCAC 

CCAATGATTCGCACAAACCAACATGATAGACGTGGAGACAATATAAATGTCGAATATGAAAACCGTCCAA 

ATGTACCATTTGAATATCATGCTGAACCTCCTCGTACAAATTCATCGACGGGAATTATCGATCGACCAGA 

TATTAGACCTAGAGCTGATCCCACCTGGCAACGTATGCCTGATGCCAACATACATCAGGAAACAACAACT 

GTACAGACTCCTGATCATGGGGAGTTAGATACCATGATCAACAACGAACACCAACTACCACGATCTGGGG 

AGGGTAATTACCCCGGGCAACAGGTGCGCACCGATATTATTGGGCAATTTCGAGATCGCGGGCCTACCAC 

TCTAAACACTCCGATCGATCTAGGTGTACCCGATGAAACAGACGATATTAGTATGACATCAGAGAATCCA 

ATTGATTCCCCAAATTCCGAGATGATCATATCCCCATCTTTACCCACAAATGAATTGGAACATCAAATCG 

ATATCAGTTCAGGGGAGATGTCGTTATTGCAAACGAATATGGAAGCAGATAACGAATTGAAAACAAATGA 

AATGGTATTATACAAATCAAAAAATGATGGTATTATCATTCAACAACAACAATTCACTGAAAATTTGTCA 

GATGAAAATGAAGAAGATTCATCAACAGATGAGGAAACATTGGAAGACAAAAAACAACAGCGATTGGAAT 

ATAATATTTCACCAAACGATGAGTGGATAAATAATGACGTTCAGAACGAAGATGACACACAAGTGCCACA 

TGTTAAGGAACCAATCAATTATGAAACTCAAAGTAGAAATGAAACAAACATGCCACGAATTGAAATGGGC 

ATAATAGAAAACTTAAGTGATGATGGAAAGAATACACCACGTGAATTACGTATCGTCACCTACGATAATA 

ATAAAGAAATTGAAAAGTACCAAGACAGTAATATCGAGATCCTGGAACCCAGAAACGAAAATGAAAACCA 

GACATTCATTGAAAGCAACTTAGAATTACTTGACAATCAAGAAATGTTTCAAGAAGATCCTCAAGTTGAA 

GATATTCGATTGACAACTCCAAAAAAGGACAAATCGTTATCAC CTGATTTCAATCAAACC CATAATGAAA 

TACAACTATTCATGGCAGATATCAATGAAGATATGCTAGAAGAATATGATGAAAATATAAATATGAATGA 

AGTGTTAGCTGACTCCACGGAGACGTTGGACAAAGAATTAGATTTAGATGAAGAAAGTGGAAGGATCGAA 

TATATTGCTGATAGAGTTAGAAAAAAGACAGAGGTACTGATGGTGCGCCACACGGGAAATATTTAAAGAA 

AAATGATAAAGATTTTGGTTCAATAAAAAGTCAGAAAAAATCTGACGCACAAATGGATGATGAAGTTGGA 

ATTGCTATTTCGAAGATCAGAAACTTTCCATTTAGATTGAAGGATGGACGAGCAAGTTTCTTCCCTCCAT 

ATAAAACAAAATTTGGAAGATCAGTGCATCCACCTAAAAGATATTTAAATGCCATTGTTAAGAAAATAGA 

TTACAATCAAAAAGAATGGCGTCAAAGTATGGAAGAAGAAATCGAAAAATTTAAGGCTAACCAAGTTTAC 

ACCGTTGAAAAAACACCAAAGAACGTTGTCCCATTGAAAACCATGTGGGTACATACTTACAAAACCAATG 

ACCTCAAAAATCATAATTACAAAAGCCGTTGCGTGGTAATGGGAAACTATATGGTCGAAAATCGTGATTT 

TGATCCCCATGCCATCTCCTCCCCGGTAGTAGATCTCACAAGTATACGACTATTATCTGCCATAGCTGTT 

GAAAATAACTTGGTTATGCACCAATTGGACATCGCCTCAGCTTATTTGAACGCCAGTTTGGAGGATGGAA 

GAGTAATCTTTGTGAGACCACCGCGTGGTTTTGAGGTTAAACCTGGCTATAGTTGGCGTTTACACAAGTC 

TGTGTACGGTCTTAGGCAGAGTGCCCATAATTGGTACTCACATTTTAAGAATGTGTTGGAGGCAAATGGT 

TTAAAACAAACACTACACAATGATGGCATTTTTTGGAAAAATTATGAAAATGGAGATGTATTATATGTGA 

GTGTATATGTGGATGATGTTTTTATCAAAGCGAATTCAATGAGTTTGTGCAACTAAATTTAGAGTTGCTT 

TTAGTTTACTAAACAAATTTTATCCTTGCTAATCAATACTATCTATTATGCACGATCTAGCAACCTTAAA 

ACAACCAATGGAAAAATTAAAAAAATTCCCTCATCAATCTGGCATGTTCGAATTGAAAAAAAAAAAAGAA 

AACAATAGAAATTCAATACAATAGAGCATAGAACTGGCCAGAATGTGAGACAATAAGTCAGAACAAGTGA 

TTGCCAGTATAGGTAGGGAGAAGCAACAAAGAGAGTTTACACAGCTGAAAACAATCATATCGACGGTTAT 

TGCAACTTGGTTGCTATTTCAACTATTCGTAATGGTCCCATTTTTAGCCAACACAATTTCAGAGAAGACG 

CGAAAAAGGACTTGGAAACTTCATAGTTTAGAGCCACAAACTAT7VAGAAATAATAGTACGATCTAAATTG 

GTTCCCTAGGATAATGCCCAACAAAGAAATCCCCCAAATAATTGTAAATTGTTCAACCTTAGTAACTCTA 

TCTAGCATTGCGGAGTTCCTTGAAAATGAATTGGTTTGGTGTTCCTACCTGTTCAGTACTTAATCACTAA 

CTAGACAAATTCTTTGGCGAAAGCTCAACTTTTGTGAAGGTCTTTCTCTACTATGAACATGACTCCCAGC 

AAGTCTAGGTTTGGCTGCACTATGAGTTTAATTTAGTTTTATCGGGCTAATACTACTTATTTCCGTTATC 

GGTGTGACCCCCGAAGAAAGGGTATTACGGGGCTCATAATTTTTTTTTTTTTGGCAAGTAGAGTGAGATT 

CAAAAAAGAAAAGTGAACCAGAGCAATAATTGCTATTAATTTTAGTTTTTTACTCACTAGCTATACTTGG 

CTCCCAAACTGATTTTGTAACCCTTTGAGCAAGGTTGTTGGTCAACTGCAAGATCAACTAAGCAAGATCA 

CGCCTTATACGCAAGCCCTGCCAAAAAATAATTCACTCTTGAAACAAGGAATTAGCAGCTATTAGGTAGA 

AAGAGAAAACAATAGAGAAGGGGTTTAGTTGATTTTCCAATACATTTTAGTGCTGAATTACATTTATCTA 
TTTAGTTTAGTTCCATAATCTTTCTAATATTGTTGAACCATTAGCAAACTTTTTAGATTAAAAGCTCTTT 
TGTAACTGTTTTTTTTCTGTAGTTATCGCGTAACCTTTCCCCCTCAGAATTTCTAAACCCTCCCCCCCCT 



TTCTTCAAAACATTAAAGACTTTGAACTTTATCATCACCACAAAAACTTATTAAGCTCCAGCAAATTTCA 

GGTGACACCAAGGAAAACAACAATTAACATTCTTGGAGTTAAGAGTATATGCTGGTGCATGGATTAAATA 

TGCCTGTTCTTAACCCCAGCGAAAAGAATATGTTATTTTTGAACAAAAAAATAGAATATCTCAAATAAAT 

TTGTTCTCCCCTTTTGTCTATCTATCCCTTTAGCTTTTTGCCAAATTCCAACACAAAATGCTTTAGTCTG 

CAGAAATGATGACTAAAATATTCCTTTTCTTCAAAATTCATATTTTCAAAATTTAGCAAATGGTTGTACT 

AGATATCAGAATTTTATCTGGTGAGTTTACTCAACCATAGTAGTCTTTTTTTAGATCAAAAATTAGACTT 

ATGAACCCTATATTGAATAAAGTTAGTGTTCCCCACAGCTATTCATAATAAAAAAGCTTAACAAAAAGTT 

GAGATTATCAGCGACGATCGATCATGTCGTTCCAGAGATTGTGTTATAGCGCCTCCTTATGAACAGGTAA 

ACTATTAGTTGCATGTAGATCTATTGTGTTCAAATTTAAATTTTAAGAATTGTTAGCTCAAAACAAAGAC 

GACCTGAAATTCCAAAAATCATAAAGTTTACCCCCAAAAAAGTAACGACAATAAAGGTGCACCAAGAAAT 

AATGGTTGTAGTTTTTCCTTTATCTGTTTTAGATTGCTTTATTAGGGGGTATCACTAATTAGCAATTGTA 

GCCCTTGCTCGTTATTGTTGCTTGATTTTTTCTAAAAACATTTGCTTAGCATTATTGTTGTAAGACATAT 

TTATCTATTGTTTCTCACCCTTTTAGACAAATGATTAGCGCCCCTTGACACGATCACAGCCTATTGTTTG 

GTGCACTATTTGAGCTTTAAAGTACTAACTTGTTTTCAGACTATCAATCTATGTGTTTGTTCAAAGCCAG 

GCACTCGAGTCATTAGTCAACAATAGGCTGTATGTTGCTATCCATGTAGTGCCTTGTCTACAGAAATTTG 

CTTTTTTAATTCACAAGCATGAGATTTTTTGTTTGTGTGGTATTTGACGTAAATGTAACATGATTACTTG 

AAATTCGATACGATCTTTTTCGTCGTCTATACAAAATTTATCAAGTGCTACTCTGTGATATTTTGCAAAA 

CCAATCTCATTGTTCCTTGCATGAGAATGATTTCGTTGTCATCAAAGAAATATAAGCTTTCATTACCACA 

ACAAATAGCACATGGTACTACCTTCCCAATTAAAGTATGATGTAACCGTCGTTGTCCCCTTATGTCAAAT 

GCAAAGTGAACATTCAAACTTAAATGCGAGCAAGAGCAATTATAATATTACTTCTTCTAGCTTTACAAAA 

TAATATTTTCATCATTTCTGAGTTTATTAGTAGAAACGTTAATATTATTTCAGAAAAGACTACAATAAAT 

TATTGGGGTAATTCTTAGCGGTAGGTTCTCCTGCCCACGAGTGCTTTGCACTGTAGGTTAAATTTATTTC 

TTCAGGATATTCCTACCCCTCTAGGTTGTACTAACCATTGATAATTACTTGCAAATATTTTTTTCAAAAA 

AAGAAAACCCTTTACATAAATAAGCTTTATATAATTATACGTTGAAAAATGACCCTAATTAGTGTGCAGT 

TTTCAAATCTTAAATGTTTCTCTACCCAATGATTACAGAGATCATCAACACTTGTGAATGGACATCATAT 

CTGTACGCTTTTCTAGGCTGCGAAATTATGTAACTTCTTGGTGTACAAAAAATTGCAACCCCTAAGAAAA 

TCATAAGTTTATATCCAAGAAAAAAATGGTTTATAAGCGTATAATGAAAATAATAATATTATTAACCACG 

ATGGCCAAAAGAAATCTAAAGTTGGCAATAATTCGCTAGTTGGGGGGAAGTTGCCAATAATAAATGAGCA 

GGCGTTTTGATATTTATAATAATAGGTCACCTGTTTTGAGTATTTCCTACAGGGACTTTTATTTTCATAA 

GGTGGATATGCTATCACTTGGTGAAACAACTTCAAATTCGTGTACTTTGCTTATGCCAGATACTTAGCAC 

TGGGAAATTGTTACAACCCCATTTCTGGAAATGTAACGTCACCTGAAACCATCTTATGGTCCTGCCATTG 

GTGTTTCATCGTGTTACAATGCTAGGTTTTTTAAATGTCTACAAGTCAATATTATATTCAAGATAAACTT 

TTCAAAACATCTGATTTATTATGACATTATTCTTGTTGACATTTTTTTGGGGTAGACAAGAAATAATTGC 

AGATAATATAGAACACTTATGCCACGTGGGTGGATTTAATAGAATCCTTGTAAAATATTATCTCTAGAGA 

ATTATAAGGGGAGGAGAGAAGATCTATGGCAATGCAAGAAAATGCAAGATCATCGTAAAAAAAGTATAAG 

AATGACTCCATAAGATATATAAACCCACTTGTTTGAAGAGCGCTTACTACACGGGGTTGTCTTAATACAA 

AGGCGGCAGGGTTGCAGTACTTCTGTAGTTTCTAACCTTTGTATTCCTTAGGCCCTGGAATATAATACTT 

CCTGTAGTAAATGTCGGAGTTTAAATTGCTGACATTGCAAGAAAATAAAACCAATATAATATTTTTTATG 

TCACGAAAGAAATGGAACAACAATGTAGCACCAAAAGGGGTAGAGACTAGGCAGTACTATATTTGGAGGT 

AAAAGTATATTAGAAAAAGAACCTATACATGAACCAGTAACCATAACAAAAAAAAACTAAACCCAAGCAA 

TTAACCATCCAAATTTAACCCGTTTTATAATACAATTTTGACCACATCTA 

>retrotransposon__37 GAG 305aa 

MAEFSDAELRKMMGTLSLLVQDSRREIlffiLHD^^ 

SELWTWIMCFNQVKRFHPQVFDAFMEAENEDEIGIEKIQYTPYTGKHLNDMIRIFYMK 
NVSREiyQTOGQPQFVPNLFKKVYEMIISKPDVSAAERIGKALFKLQS 

DDIILKFLVSGVSPWYLHSQIYMSSYKLGFSNLFLEIYAQHYELYKADPIYKLPDSMTLLNEIRSNRDYP 
KVWAAKNTVQVNNVS S KMNKKKDE 

>retrotransposon_37 POL fragment 1 155aa 

SEINATSTYHEIGDTOKNKEQLII^LK^^ 

TAEGETS TAYALGTLT I S VNGLNAKLDGVLYLPS IQLNL I S I KQFEDLC YAILISENLMCLVHSDHGPTV 
IAKYS PKDDLYSGPR 

>retrotransposon_37 POL fragment 2 795aa 
MTNKVERVTYVSIRNIKQEVADKYMIKDLYYYHLLINHLSHEKLQLLV 

CVAAHAKIASHNHTQQRELERPLQRLHLDTAGPFTSNKTKSYLTTVIDQFSRYTEVIVSDTKAVKQSILH 



RLRVWNM^FQFKIAEIRYDNALEYPSAEELEELGIYKHLLPNYSPMLNGTAEATNRPIVQGIYP 

CQVLILFPFIVEYAVHIRNHTPIKEFDGATPYERYYGLSKYVIPFFQFGTDVLIKCASVQEAISLKLPSS 

RDKAFPTVMFGAFLGYGSDSFTFRVLVSTKGYPVITTSlSriRP IATMQVLNDYLAYI SENS S I S YDDTFLS 

PLmPMIRTNQHDRRGDNINVEYENRPWPFEYHAEPPRTNSSTGIIDRPDIRPRADPTWQRMPDANIHQ 

ETTTVQTPDHGELDTMINNEHQLPRSGEGNYPGQQVRTDIIGQFRDRGPTTLNTPIDLGVPDETDDISMT 

SElSrPIDSPNSEMIISPSLPTNELEHQIDISSGEMSLLQTNMEADNELKTNEiyEVLY^ 

ENLSDENEEDSSTDEETLEDKKQQRLEYNISPNDE^^ 

IEMGIIENLSDDGKNTPRELRIVTYDNNKEIEKYQDSNIEISEPRNENENQTFIESNLELLDNQEMFQED 

PQVEDIRLTTPKKDKSLSPDFNQTHNEIQLFI^ 

GRIEYIADRVRKKTEVSMVRHTGNI 

>retrotransposon_37 POL fragment 2 (reverse transcriptase) 257aa 

MDDEVGIAISKIRNFPFRLKDGRASFFPPYKTKFGRSVHPPKRYLNAIVKKIDYNQKEWRQSMEEEIEKF 
KANQWTVEKTPKNWPLCT^ 

LSAIAVEIOTLVMHQLDIASAYL^^ 

VLEANGLKQTLHNDGIFWKNYENGDVLYVSVYVDDVFIKANSMSLCN 

>retrotransposon_38 3159bp public: 1..2084, Incyte: 2085.. 3159; san-like 
LTR: 2638.. 3019 

AATCTGTCCACCTCGTTTTGAGAGGTTCTCAAAATTCTTTGTAATTTTCAAACTTCACCTTTGGCTTTGT 

AAAGTTGGTTTTTTAAGGAATAGCTTTGATTATTTGACATTGCAAACAGTATAGTCAAGATGCACACAGA 

TTGGACCTGAAATTATTCCTTCGCAAAAACTTAAAATAACCCAAATATTAAACATCCACTCGGATTCAAA 

TACCTCAGCACTCTTTTATAGGCACTTGTATAATTTGTTATATGAATCATTTCCAGCTTCCTTGTAGAAC 

CGCCAAATATTTGAATCACATGGGAAACAGATTTGACCATCTAACTTTCATGGTTCTTATGAAAAAGATC 

TGGAAATGGTGATATAGCTTGATTGTCTAGCATATTCAGCGATTACCCTATTTTGTGGTTGCCTGGGATA 

ACCCCTGGCTGTTGTTGGAAAAGACTCGTGACAAGTATTTTTGCCCACGAGTTTCTAATTACTGCGATAT 

TATCCAGTTACATTTTCGCAACTCGTTCTACTTGAGCTCCTTCTATGAATCAACTAGCTGGCTATTTCCC 

TGGATAGAAAACCTTCATTCTTCTTCTCCTGGTTGAGTATCACCGACTTGTGGCCGTACCGTTCAACCCC 

CTACAATACACCATCAACTTTATACTTGTAATACTCGGCTTTGCCACTCCCCAAACTAACCACTATAAGT 

TCATACTCCTTGGCTTGCTTGACTTTCCTATTTCTTAACCCACTACTCTTCTGTACCACTCCGATCATCA 

GATTGACAGAGGTTACTTCATACCCAACAACATTTTCATACCAGTCGACCTTCTCCTCTGCACCACCAAA 

CCCAACACATCGGATTTCCCTGGGATCTCTCTCAACTCTCAAACATATTGCTTTCTTATCTACCCTGAAC 

GTGTGCAC CACTACCACC CCTTCTATCTCATATACCACACTGAACGATGAGATCGCAGCACTC CCACAAA 

ACCGACAATGCAGCGGCTCAGGATACGACACCCTCAACGAGTTCACCTTCATATTCCCGACCCCAAACAG 

TTTGATGACCACCCCCGTGTTCACATCTATAAGCTGACACTCTAACCCGTCAACACGTATAAAGAACCCC 

ACAAACTCAACCGGAAATATCCCACACAGTTTCAGGGGCGCCACCTCTAGCTTTCTGCTCTTCATGCTGT 

TGTTGACGATGTTCACCACAATAATATCCAACTCCTTCGTCTGCACAACAATTCTATCCATCACCCTTGG 

TGTTCTTATCTTTATTGCACAGACCAACTGCTGCTTCACATCATAACTCTGTACTTTCCCATCATTACAC 

GACACAACAAGTATCTCCCCACTATCCATGACCATCACAAACTCTTCCCTACTAGTCCTCTCACGCTGTT 

TCTGTCCAAACGATTTCATCTGTATTGGTGGCGGAAAGTTCGCATTGATCAGCGAATTTACCGACGACAT 

TGACGCATCACTGCCCCTCCTCTTTCTAATCATTTTACGTGCTAAAAACCCCGGCACAGTTCTCCGCCTG 

AAAAACGACTCCAACACTTTACCTCGAAAGTGCACCGACAGTGTCCACTTCAACTCCCGCTTGTCATAAC 

CCTGTATGACACCCTGTCTAGTACTCACCAACACAACCATACTCCCATCATCATTGAGCCCCACATGGCT 

GACCGGCCACATCTGACAGGGTATGGCTAGTGGTTCAGGGTCGTAACAGTACTCGACATCTTGGGGTTGG 

TAGTGATATATCTGAACTCGTATCCATCATATAACTCTTCTCCTCAGCAAACTCAATGGCCTGGGTTTTT 

GCCGGAACCACTAGTGCAACCACCAACAAGAGGTACTCCACATAGTAAATGTACGTGTTAGACTGGGAAA 

CAACCACACTGGTTTGGTCGACTCAGCACGCTATTCATCAACAATACCCCCAACAGAATCACCAAGTTAT 

TTGTCAGCCTCAGTTTGTACTTCCACCACTGACCCCACCACCGCATAGTTCACCAAAAGGGTCTTGCATA 

ATCCACGTCCCACCATATCACTTCAACTCCCATATTCCTCGATGCAAGAATAACCACAATAATCGGCTTT 

CGTAAACGTCGTCAGTGGCTCAAACACATTGCTGCACCTTGAGCTCTAGAACAACCCCACACTCACTAGC 

CATCGCCACACCAACAACCAAATTGCTGATCCAGAAAAAATACCACCCCCGTAGTCCGGCTTGTATGGAA 

TAATTGCTTGGCCAGGTACGTCCCCACCTCATCGTGTCTTTTCTGGTTGAAATATGTCATCTCCCGGGCT 

AACAGTACCGTATCTCTGTGGCTGGGGCATCTATACTCTTTCATTCTCGGCTTACAAATCTATCTTGTTC 

ACACATTTCATATATCTGGGACTTGTCGAACTCTCTGCACTCTATCATAAACTGGAACTCGCTTGCATTC 

TGGGACACACACTGGAGCTGGAATCCATGGTCAGGAAATGTGAAAATTTTCTTCTCGGGAAATATTTGTG 

ACAATTAGTCCTAGTACACGATAGTTTCATTACGCCCACTAAAAGTGTCTACTGAAACTCGGTCTCTATA 



TCGTCAATATCTTTCATTTCTCTTCCTGGCTTTTCACTGCGACTTATTGTTCGCTATAGGGTAGGTCTTC 

CAAGCTAATTTTACCCGACACAAGATGAAATATTTTCTGTTGAGCACTCGTTGTCGACAGTGAAAAATTT 

TCACTCAAGAAAATATTTTCATCATCACTTTTTCTAGAAAGGAGGTTCAAGTGTTGGAGAATAGACAGCG 

AACACCTGATATTCCCAAGGTCGAATTAGATTGAAAGATAAATAATAGTCATATTTATTTTGTATTTAGT 

CAATAAATTATCTTTTTATATTTAAATTCTTAGTATTGTCATACCACGTAGATTGATACGGACATACTTA 

GCACATTTAACATATATTAAGCACCGATTACCTGTGACATTCCGAAGTTTACTGTTTCGCGCACGCTGGC 

AGACGAACACTTATCAAGGTGCTACTCCCGCGCATCAGTTTCCTCTGGGTTCTCTTTTTGATCTTGGTGA 

ACTACCTTTTTTTCCCACTCGCGTGAGAAGTTCAACACTTTTTTTTACCCATCCACCAAACTTTATTCTT 
TTCCCCACC 




Name 



Length (bp) Regions of interest Remarks Novelty 

AF041469 (280 bp) Candida albicans retrotransposon long terminal repeat kappa, complete sequence 



retrotransposonJOl 994 
retrotransposonJ)2 1348 

retrotransposon_03 3034 



LTR kappa: 548.-927 

LTR kappa: 764..1043, 
POL (contains stop 
codons): <136..714 
LTR kappa: 75.. 354 



partial sequence present in 
public domain 
partial sequence present in 
public domain 

complete sequence present 
in public domain, identity 99% 



AF043301 



(5624 bp) Candida albicans retrotransposon-like element Tca1, complete sequence 
retrotransposonJ)4 3504 Tca1 -like LTR:688„ 1 075 



retrotransposon_05 3955 
retrotransposon_06 1 434 

retrotransposon j07 1 606 



TcaMike LTR: 2656..3043 
Tea 1 -like LTR: 87. 475 

Tca1-!ike LTR: 1046..1433 



complete sequence present 
in public domain, identity 99% 
complete sequence present 
in public domain, identity 99% 
complete sequence present 
in public domain, identity 
100% 

complete sequence present 
in public domain, identity 98% 



AF050215 (6980 bp) Candida albicans Tca2 retrotransposon gag poly protein (gag) and pol polyprotein 
retrotransposonJ)8 1385 Tca2-like LTR: 49..328 



retrotransposon_09 1 483 

retrotransposon^ 1 0 879 

retrotransposon_11 974 

retrotransposon^ 12 3868 

retrotransposon_13 469 



Tca2-like LTR: 871 „1 150 
Tca2-like LTR: 326..605 



Tca2-like LTR: 483..761, 
CTA2 (transcription factor): 
join(<974.>778,<223..>1) 
Tca2-like LTR: 127..407 

Tca2-like LTR: 75..35S 



(pol) genes, complete cds 
partial sequence present in 
public domain 

complete sequence present 
in public domain, identity 99% 
complete sequence present 
in public domain, identity 
100% 

partial sequence present in 
public domain 

complete sequence present 
in public domain, identity 99% 
complete sequence present 
in public domain, identity 99% 



^ F06 J l57 f < 583 bp) Candida albicans retrotransposon Tca3 reverse transcriptase (pol) gene, partial cds 

ii^>^ 

"~ Tca3-iike LTR: 1 509.. 1 822 



son 



partial sequence present in 
public domain , \ 



retrotransposon jl 5 2093 

retrotransposon_16 2099 

retrotransposon_ 1 7 3284 

retrotransposon_18 791 

retrotransposon_19 4581 



Tca3-like LTR: 1565..1878 

Tca3-like LTR: 2750..3063 
Tca3-like LTR: 277..590 
Tca3-like LTR: 2725..3037 



partial sequence present in 
public domain 

complete sequence present 
in public domain, identity 
100% 

partial sequence present in 
public domain 
partial sequence present in 
public domain 
partial sequence present in 
public domain 



AF065434 



(1 145 bp) Candida albicans retrotransposon Tca5 reverse transcriptase (pol) gene, partial cds 

retrotransposon_20 5325 POL protein: rearranged partial sequence present in 

CD § public domain 



AF069450 



(508) Candida albicans retrotransposon long terminal repeat zeta, complete sequence 
retrotransposonJ21 2027 LTR zeta: 1 384.. 1 891 

retrotransposon_22 21 1 8 

retrotransposon_23 4929 



retrotransposon_24 4954 

retrotranspo$on_25 1 047 

retrotransposon__26 7929 

retrotransposon_27 2292 



LTR zeta: 141 9.. 1927 
LTR zeta: 2990..3497 

LTR zeta: 256.763 

LTR zeta: 314..822 

LTR zeta: 3346..38S3 
LTR zeta: 1327.. 1834 



partial sequence present in 
public domain 
partial sequence present in 
public domain 

complete sequence present 
in public domain, identity 
100% 

complete sequence present 
in public domain, identity 
100% 

complete sequence present 
in public domain, identity 
100% 

partial sequence present in 

public domain 

partial sequence present in 



>, 03049838 



retrotransposon_28 2025 
retrotransposon_29 2731 

r^trmi^pomn_Jf> : :v:i 2§58 : : : 

retrotransposon_31 1636 
retrotransposon_32 21 25 



LTR zeta: <794..1294 
LTRzeta: 380.. 887 




;"^pn^5Z^537:^ ;/ 
LTRzeta: <595„1098 

LTRzeta: 1105..1612 



public domain 

partial sequence present in 

public domain 

complete sequence present 
in public domain, identity 
100% 

partial sequence present in 
>pubibdomain 



partial sequence present in 
public domain 
partial sequence present in 
public domain 



AF074943 (381 bp) Candida albicans retrotransposon long terminal repeat san, complete sequence 

retrotransposon_33 1292 LTR san: 369..749, CTA2 partial sequence present in 

(transcription factor): public domain 
join(974..>234, <888„ 1 292) 

retrotransposon_34 568 LTR san: 1 13..493 partial sequence present in 

public domain 

retrotransposon„35 946 LTR san: 1 1 3..493, CTA2 complete sequence present 

(transcription factor) C- in public domain, identity 

terminus: <632..946 100% 

: --- d ^^ 1 - ;: ■■z^tf/tpot. ••■,.:/;-; , , - pubHcdbmaih ; : 

l~ > 'compete seqyence;ipresent 
k^^B/^MP^^: * in.p i u&^dpffl& 1 identity * 



ml-: < - .. . ; 

^^^P 050 ^^ 3159 LTR san: 2638. 3019 ' ' complete sequence present 

Acn-roono/^^L. ^ ^ in public domain, identity 99% 

Ai-078809 (1470 bp) Candida albicans Tca4 retrotransposon reverse transcriptase (pol) gene, partial cds " 

retrotransposonJ36 (see above) 
retrotransposon_37 (see above) 
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DECLARATION FOR PATENT APPLICATION AND POWER OF ATTORNEY 

(Includes reference to PCT International Applications) 

FROMMER LAWRENCE & HAUG, LLP 
File No.: 674521-2001.1 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name. 

I believe I am an original, first and joint inventor (if plural, names are listed below) of the 
subject matter which is claimed and for which a patent is sought on the invention ENTITLED: 

AN UNUSUAL RETROTRANSPOSON FROM THE YEAST CANDIDA ALBICANS 

the specification of which: 

□ is attached hereto 

X was filed on October 29. 1999 as: 

X United States Application Serial No. 

□ PCT Application No. 

X with amendments throug h DATE EVEN HERF.WTTH (if applicable, give details). 

I hereby state that I have reviewed and understand the contents of the above-identified 
specification, including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose to the United States Patent and Trademark Office all 
information known to me to be material to patentability as defined in Title 37, Code of Federal 
Regulations, § 1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code § 1 19 (a) - (d) 
or § 365 (b) of any foreign application(s) for patent or inventor's certificate, or § 365 (a) of any 
PCT International applications) designating at least one country other than the United State of 
America listed below and have also identified below any foreign application for patent or 
inventor's certificate or any PCT International applications designating at least one country other 
than the United States of America filed by me on the same subject matter having a filing date 
before that of the application(s) on which priority is claimed: 

Prior Foreign/PCT Application(s) [list additional applications on separate page]: 

n * f t,^n Priority Claimed: 

Country (or PCT) Application Number: Filed (Dav/Month/Yeart Yes No 
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AND POWER OF ATTORNEY 674521-2001.1 

I hereby claim the benefit under 35 U.S.C. § 1 19(e) of any United States provisional 
applications) listed below. 

60/106.342 OCTOBER 30. 1998 

(Application Number) (Filing Date) 

I hereby claim the benefit under Title 35, United States Code § 1 20 of any United States 
applications) or § 365 (c) of any PCT international applications) designating the United States 
of America that is/are listed below and, insofar as the subject matter of each of the claims of this 
application is not disclosed in that/those prior United States or PCT International application(s) 
in the manner provided by the first paragraph of Title 35, United States Code § 1 12, 1 
acknowledge the duty to disclose to the United States Patent and Trademark Office all 
information known to me to be material to patentability as defined in Title 37, Code of Federal 
Regulations, § 1.56 which became available between the filing date of the prior application and 
the national or PCT international filing date of this application: 

Prior U.S. (or U.S.-designating PCT) Application(s) [list additional applications on separate page]: 

U.S. Serial No.: Filed fDav/Month/Yeart PCT Application No. Status (patented, pending, abandoned) 



I hereby appoint Thomas J. Kowalski, Registration No. 32,147, and FROMMER 
LAWRENCE & HAUG, LLP or their duly appointed associates, my attorneys or agents, with 
full power of substitution and revocation, to prosecute this application, to make alterations and 
amendments therein, to file continuation and divisional applications thereof, to receive the 
Patent, and to transact all business in the Patent and Trademark Office and in the Courts in 
connection therewith, and to insert the Serial Number of the application in the space provided 
above, and specify that all communications about the application are to be directed to the 
following correspondence address: 

Thomas J . Kowalski, Esq. Direct all telephone calls to: (212) 588-0800 

c/o FROMMER LAWRENCE & to the attention of: 

HAUG, LLP Thomas J. Kowalski 

745 Fifth Avenue 

New York, NY 10151 

FAX (212) 588-0500 

I hereby declare that all statements made herein of my own knowledge are true and that 
all statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful false statements may jeopardize the validity of the application or any 
patent issued thereon. 



Page 2 of 3 



DECLARATION FOR PATENT APPLICATION 
AND POWER OF ATTORNEY 



FLH Docket No 
674521-2001.1 



INVENTOR(S): 

Signature: Date: 

Full name of first inventor: Russell Tony Masell POULTER 

Residence: 69 Irvine Road, RD2, Dunedin, NEW ZEALAND 
Citizenship: British 



Signature: Date: 

Full name of second inventor: Walter Herman Maria Louis LUYTEN 

Residence: Turnhoutseweg 30, B-2340 Beerse, BELGIUM 
Citizenship: Belgium 



Signature: Date: 

Full name of third inventor: Marianne Denise DE BACKER 

Residence: Turnhoutseweg 30, B-2340 Beerse, BELGIUM 
Citizenship: Belgium 



Signature: Date: 

Full name of fourth inventor: Bart Josef Maria NELISSEN 

Residence: Turnhoutseweg 30, B-2340 Beerse, BELGIUM 
Citizenship : Belgium 



Post Office Address(es) of inventors [if different from residence]: 

NOTE: In order to qualify for reduced fees available to Small Entities, each inventor and any 
other individual or entity having rights to the invention must also sign an appropriate separate 
"Verified Statement (Declaration) Claiming [or Supporting a Claim by Another for] Small Entity 
Status" form [e.g. for Independent Inventor, Small Business Concern, Nonprofit Organization, 
Individual Non-Inventor]. 
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